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Preface 


This volume is the proceedings of the 13th International Conference on Theo- 
rem Proving in Higher Order Logics (TPHOLs 2000) held 14-18 August 2000 
in Portland, Oregon, USA. Each of the 55 papers submitted in the full rese- 
arch category was refereed by at least three reviewers who were selected by the 
program committee. Because of the limited space available in the program and 
proceedings, only 29 papers were accepted for presentation and publication in 
this volume. 

In keeping with tradition, TPHOLs 2000 also offered a venue for the presen- 
tation of work in progress, where researchers invite discussion by means of a brief 
preliminary talk and then discuss their work at a poster session. A supplemen- 
tary proceedings containing associated papers for work in progress was published 
by the Oregon Graduate Institute (OGI) as technical report CSE-00-009. 

The organizers are grateful to Bob Colwell, Robin Milner and Larry Wos for 
agreeing to give invited talks. Bob Colwell was the lead architect on the Intel 
P6 microarchitecture, which introduced a number of innovative techniques and 
achieved enormous commercial success. As such, he is ideally placed to offer 
an industrial perspective on the challenges for formal verification. Robin Milner 
contributed many key ideas to computer theorem proving, and to functional 
programming, through his leadership of the influential Edinburgh LCF project. 
In addition he is known for his work on general theories of concurrency, and 
his invited talk brings both these major themes together. Larry Wos was the 
developer of many of the fundamental approaches to automated proof in first 
order logic with equality. He also led the way in applying automated reasoning to 
solving open mathematical problems, and here he discusses some achievements 
of this project and future prospects. 

The TPHOLs conference traditionally changes continent each year in order 
to maximize the chances that researchers all over the world can attend. Starting 
in 1993, the proceedings of TPHOLs or its predecessor have been published in 
the following volumes of the Springer-Verlag Lecture Notes in Computer Science 
series: 


1993 (Canada) 780 1997 (USA) 1275 
1994 (Malta) 859 1998 (Australia) 1479 
1995 (USA) 971 1999 (France) 1690 


1996 (Finland) 1125 


The 2000 conference was organized by a team from Intel Corporation and 
the Oregon Graduate Institute. Financial support came from Compaq, IBM, In- 
tel, Levetate, Synopsys and OGI. A generous grant from the National Science 
Foundation allowed the organizers to offer student bursaries covering part of the 
cost of attending TPHOLs. The support of all these organizations is gratefully 
acknowledged. 


May 2000 Mark Aagaard, John Harrison 
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Fix-Point Equations for Well-Founded Recursion 
in Type Theory 


Antonia Balaa and Yves Bertot 


INRIA Sophia-Antipolis 
http: //www-sop.inria.fr/lemme/{Antonia.Balaa,Yves.Bertot} 


1 Introduction 


Inductive type theories, as used in systems like Coq or Lego [{11,14,4], pro- 
vide a systematic approach to program recursive functions over inductive data- 
structures and to reason about these functions. Recursive computation is descri- 
bed by reduction rules, included in the type system under the name t-reduction. 
If t is an element of a recursive type, f is a recursive function over that type and 
v is the value of f(t), then the equality f(t) = v is a simple tautology, because 
f(t) and v are equal modulo z-reduction. 

In practice, the inductive data-structures on which recursive functions may 
be defined are not restricted to simple concrete data-types, such as term alge- 
bras or finite-branching trees. Infinite branching is also allowed, so that some 
inductive types are powerful enough to describe the more complicated notions 
of terminating functions [9]. The intuitive notion is that of well-founded orders, 
for which there exists no infinite descending chain. If we describe a function 
such that recursive calls are performed only on terms that are smaller than the 
initial argument for some well-founded order, then we are sure there cannot be 
an infinite sequence of recursive calls. 

For a user intending to describe complex programs, this is good news: it be- 
comes possible to write algorithms without having to follow a tedious encoding 
using primitive structural recursion. Pleasant and efficient descriptions of com- 
plex algorithms become possible, for example, algorithms for computing Grébner 
bases, as described by Théry {16] and Coquand and Persson [5]. 

In practice, using a function defined by well-founded recursion can also be 
unwieldy. In some sense, these functions are also defined by structural recursion, 
but the recursion follows the structure of the proof that there is no infinite 
descending chain starting from the function’s argument. To actually use the 
reduction rules, one needs a full description of how the proof was constructed 
and it is not enough to simply know that such a proof exists. 

This work provides a fix-point equality theorem that represents the reduction 
rule and can be used without any knowledge of the proofs’ structure. The first 
motivation is practical in nature. Without a fix-point equality, it is hard to collect 
information about a function. In a system like Coq, the fix-point equation is only 
a consequence of the encoding based on generalized inductive data-types. With 
our work to generate this equality automatically, it becomes simpler to reason 
about well-founded recursive functions in Coq. 


J. Harrison and M. Aagaard (Eds.): TPHOLs 2000, LNCS 1869, pp. 1-16, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 
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1.1 Related Work 


This work draws its theoretical and practical background from all the work 
done around inductive definitions in mathematics and computer based theorem 
proving tools. From the theoretical point of view, the main entry point may be 
the work by P. Aczel [1]. The immersion of inductive definitions in type theory 
based systems was mostly studied in the 1980 decade, for instance by Constable 
and Mendler in [3,2]. The flavor we use in this paper is mostly described by 
Coquand, Pfenning, and Paulin-Mohring in [14,4,11]. General use of well-founded 
recursion in Martin-Lof’s intuitionistic type theory was studied by Paulson in 
[12], who shows that reduction rules can be obtained for each of several means to 
construct well-founded relations from previously known well-founded relations. 
By comparison with Paulson’s work, our technique is to obtain reduction rules 
that are specific to each recursive function. The introduction of well-founded 
recursion using an accessibility principle as used in this paper was described by 
Nordstrom in [9]. 

Inductive definitions and inductive types also appear in proof systems based 
on simply-typed higher-order logic, such as HOL [6] or Isabelle [13]. Camilleri 
and Melham provide a package to systematize the definition of inductive relations 
in the HOL system [8], but this is not powerful enough to describe the notion 
of accessibility used in this paper. Harrison [7] provides a more practical tool, 
powerful enough to describe the necessary elements for well-founded recursion 
as described in this paper. In particular, the transfer theorem described below 
in section 3.1 is also described in Harrison’s work. Konrad Slind [15] also gives 
a package to ease the definition of recursive functions based on well-founded 
induction, making sure his package will be general enough to be used with two 
different proof systems. The work of Harrison and Slind is also gives solutions 
to the question of constructing fix-point equations (they are called recursion 
equations in [15]), but the main question here is whether the logical foundations 
of type-theory, with constructivity and without extensionality, is sufficient to 
recover these equations. Harrison and Slind do not cover this question, using 
extensionality and non-constructive features freely. 


2 Detailed Description of the Problem 


In this section, we describe the inductive objects of type theory as used in CC’*™4 
[11]. We show the differences that exist between structural recursive functions 
and well-founded recursive functions. 

In type theory, types are used to represent both propositions and datatypes. 
To distinguish between these, sorts are provided, that are used to denote the 
types of types. The sort Set will be used to type the types representing data- 
types, while the sort Prop will be used for the types representing propositions. 
Inductive definitions can be used to construct inductive datatypes, for instance 
the type of natural numbers given below, or to construct inductive propositions, 
for instance the proposition of accessibility given below. Types can be returned 
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by functions. When the returned value is in Prop, a function is actually a pro- 
position, will shall used the letter P to range over such propositions. When the 
returned value is in Set, a function denotes a dependent type. We shall use the 
letter B to range over such functions. 

An inductive type is specified by giving a name, a type, and the name and 
type of its constructors. For instance, the nat for natural numbers is constructed 
using an inductive definition with two constructors 0 and S. 

Inductive nat : Set := 
0: nat 
|S: nat > nat. 

It is possible to define structurally recursive functions over an inductive type. 
For a given function, the user needs to provide values for the constructors that 
are not really recursive and functions that compute the final value from the 
values of recursive calls of sub-terms belonging to the inductive type for the 
other constructors. For natural numbers, there exists an operation nat_rec whose 
behavior is described by the following two reduction rules (we use the notation 
(f £1 £2) to represent the application of f to two arguments): 


(nat_rec t Vo f 0) 4+ Y 


(nat_rect Vo f (S x)) ~ (f x (nat_rect Vo f z)) 


The term nat_rec is a plain object of the typed calculus, its type takes into 
account the fact that one may define a function with a dependent type B. The 
IT notation is used to express that the type of the value returned by a function 
depends on the value received as argument by this function. When the type is 
used as a proposition, we will replace the 17 notation with a more intuitive V. 


Nat_rec : 
I B: nat — Set. 
(B 0) > 
(Tx: nat (B x) > (B (S x))) > (IT «: nat.B(x)). 


Inductive types may be more general. Consider the notion of accessibility, 
as described in [9], and defined as follows: given a binary relation R over a 
set A, an element a € A is accessible if every element smaller than a (we use 
a terminology where R is used as an order, but it does not even need to be 
transitive) is accessible. In particular, all minimal elements are accessible. The 
notion of accessibility can be defined with the following inductive definition: 


Inductive Acc[A : Set; R: A A— Prop]: A— Prop := 
Acc.intro: Va: A.(Vvy: A(R y x) = (Acc A R y)) > (Acc A Ro). 


To make the notation easier, we shall assume that we work with a fixed 
relation, which we shall denote < instead of R. Here, the recursion operation is 
described by a single reduction rule: 


(Acc_rec t ® x (Acc_intro x h)) ~~ 
(PBchrAy: AAp:y <-z(Accerect By (hy p))) 
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Acc_rec takes the following type: 


Acc_rec: IB: A Set. 
(Tx: A. 
(Vy: Ay <2 => (Acc y)) > 
(Ty: A.y<«—(By))>(B2))> 
IIx: A.(Ace x) + (B x) 


We often omit the first argument of Acc_rec, represented here by the formal 
parameter B of type: A — Set. The function ® must have the following type: 


@: (Hx: A(Wy: Ay <x => (Acc y)) > 
(Ty: Ay <2z—-(By)) > (B z)) 


The function & takes three values as arguments, an object x, a proof that all 
elements smaller than xz are accessible, and a function that can only be called 
on objects that are smaller than xz. The reduction rule expresses that when 
computing the value (Acc_rec t @ x q), you can unroll the recursion one step, 
replacing it with a call to @ on the right arguments, but you can perform this 
step only when q is of the form (Acc-intro x h). 

Accessibility describes those elements from which one cannot start an infinite 
descending chain. A relation is well-founded when the accessibility predicate is 
verified for all elements of the type. There is then a theorem wf. that states 
wf. :Vx:A.(Ace x). With a well-founded relation, it is possible to construct a 
new function Rec, s< with the following value: 


(Recyr< F x)= 
(Ace_rec (Az: AA: (Vy: Ay < 2 => (Acc z)).(F z)) x (whe 2)) 


The function F’ must have the following type: 
F:illx: A(VITy:Ay<2z—(By))7> (B 2). 


This function Rec,,f< is used to define functions by well-founded recursion. 
But does it have a reduction behavior that is similar to a recursion operator? At 
first sight no, since the proof that x is accessible is of the form (wf< xr), not an 
instance of Acc_intro, the form that would be needed to use the -reduction of 
Acc-_rec. Under closer scrutiny however, it is possible to prove that every object 
of an inductive type is obtained by applying one of the constructors of this type 
to proper arguments. Here there is only one constructor to the type Acc and one 
can deduce there exists a hypothesis h stating Vy : A.y < x => (Acc x) such that 
the proof (wf< x) is equal to (Acc_intro x h). Based on this equality, we have 
the following rule (with some type information omitted): 


(Reeyr< Fx) ~ (Fx dy: AAp:y < z.(Acc_rec(ArAp'.(F 2)) y (hy p’))) (1) 


What is unfortunate with this rule is the presence of Acc_rec instead of Recy p< 
in the right hand side. Certainly, a rule of the following form would be preferable: 


(Recysc Fr) (F x dy: AAp:ry < 2.(Recys< F y)) (2) 
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It is the objective of this work to provide a systematic method to provide 
rule 2 as an equality theorem. 


3 Producing the Fix-Point Equation 


We can break down the problem into two parts. The first part shows that pro- 
ving the fix-point equation can be reduced to proving a specific property of the 
function F’. The second part describes a method to prove that property for a 
class of functions. 


3.1 Reducing to the Step Hypothesis 


The crux of fix-point equation proofs is to show that the values of proofs really 
are irrelevant. For any value z in A, there exists a term T obtained by a certain 
number n, of unrollings of F', such that (Acc_rec Ay.Ah.(F y) x h) = (T 2). Thus, 
if F does not use the actual value of proof arguments, (Acc_rec Ay.Ap.(F y) x h) 
does not either. Since the term h in this expression is a proof, its value is not 
used and it does not matter whether h is of the form (wf< x) or (h’ x h”). 
The transfer theorem has the following statement: 
VF: Hx: A(iHy: Ay<2«—-(B y)) > (B 2). 
(Vz: AVf: Ty: A(B y)).Vg: (Ty: Ay<2x—- (By)). 
(Vy: AVA: y <2.(g yh) =(f y)) 
=> (Fx dy: AAh: (y<2z).(gy h)) = 
(F x Ay: AAR: (y < 2)-(f 9))) 
=>Var:A.(Reewre Fx) =(F x dy: Arp: y < 2.(Recwre F y)). 
The proof of this theorem proceeds by an induction over the accessibility of 
x with respect to the transitive closure of <. See appendix A for a Coq proof. 
We shall call step hypothesis the only hypothesis of this transfer theorem. 


3.2 Using Extensionality 


The axiom of extensionality states that two functions are equal as soon as their 
values are equal on every possible argument. It has the following statement: 


VA: TypeVB: A— SetVf,g: (Hx: A(B 2)).(Vr: A(f ct) =(92))>f=9 


Obviously, the step hypothesis is a direct consequence of extensionality. If 
this axiom is added, then the question of the fix-point equation is solved. The 
rest of this paper describes how one can avoid using extensionality. 


4 A Syntactic Construction of the Step Hypothesis 


For a given F’, the step hypothesis has the following statement: 


(Ve: AVF: UTy: A(B y)).Vg: (Hy: Ay<2z—-(By)). 
(Vy: AVA: y<a2.(g yh) =(f y)) 
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=> (Fxdy: AAR: (y<cz)(gyh))= 
(F a Ay: AAR: (y < 2).(f y))) 


Constructing a proof of this statement without using extensionality will be 
done by a structural analysis of F. 


4.1 Some Notation 


Given a set of sorts S (S will typically contain categories like Prop, Set, Type), 
a term is an element of the language 7 defined as follows: 


Tus Ac: T.7T (TT) |e: T.T|S|z. 


This is the usual set of terms for type theory, see for example in section 2 in [11]. 

We will not describe the typing rules for this language here, but we shall 
assume we work in a context where a collection of inductive types has been 
defined and all the terms we manipulate are well-typed. In particular, for every 
inductive type ty we will assume there exists a function corresponding to a case 
operator, and a function corresponding to reasoning by cases, whose types are 
obtained systematically from the inductive definition following the methods in 
[11]. The case operator is a simplified instance of the recursion operator, where 
recursive values are not used, while the “reasoning-by-cases” operator is deduced 
similarly from the induction principle associated to the inductive type. 

For instance, in the case of natural numbers, the case operator is the function 
nat_case with the following type: 


nat_case : 
ITB : nat > Set.(B 0) + (Im: nat.(B (S m))) > Hn: nat.(B n) 


The “reasoning-by-case” operation has the following type: 


nat_case’ : 


VP : nat > Prop.(P 0) > (Vm: nat.(P (S m))) > Yn: nat.(P n) 
For an arbitrary type ty, we will suppose it has k constructors 
Gi a, : ti... Mtn, : tin, ty (1<i<k). 


The case operator for this inductive type will be denoted ty.case, with the 
following type: 
ty.case : ITB: ty > Set. 

(Tr, + ty... iTtn, :tin,(B (er 11---2n,))) > 


(xr, : te... Htn, : thn,-(B (ck 21.--2n,))) > 
IIx: ty.(B x) 


When considering a usage of ty_case, it will have the form 


(ty-case t l,...l, V) 
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Where each J; will have the form: 
I = Ar, 2 ti... An, tin, .d: 
and we shall sometimes abbreviate this expression as: 
ly = ArG)-b; 
For the type ty, we shall also assume there exists a term ty_case’ with type 


ty.case’ : ITP : ty > Prop. 
(Tx, : ti oes Tn, : tin, (P (cy Ty.. -Zn,))) => 
(1a, : th... tn, > thn,-(P (ck 41--.¢n,))) > 
IIx: ty.(P 2). 


This term ty-case’ will be used to construct proofs instead of values. 
We will also assume there exists two theorems eg_ind_r and eq-ind, with the 
following statements (types): 


eq-indr:VA:SetVx:ANVP:A— Prop{P 2) >Vy:Ay=z2 > (Py), 


eqg-ind: WA: SetVx:ANP:A— Prop.(P 2)>Vy:Ac=y= (Py), 
and a theorem refl_equal with the following statement: 


refl_equal: VA: SetVx:Aa=cz. 


On several occasions, we will consider contexts which are expressions with a 
hole in them, to be filled with some value. A context will be denoted C|-;...;-] 
and the same contexts where holes have been filled in with values will be denoted 
Clx1;...;2n]. We will also consider the operation of substituting all occurrences 
of a term with another. This operation will be denoted Clg := h]. This term is 
equal to C[h;---;h], where C[-;...;-] is the context such that C[g;...;g] =C 
and g does not occur free in C[-;...;-]. 


4.2 Constructing the Proof in Absence of Bound Variables 


For some values of X , computing (F X g) may lead to some term where g occurs. 
Let us say it is convertible to 


T=Cl(gT’ A), 


such that all variables occurring in the terms 7’ and H are free in T. Suppose 
we have an hypothesis He, 


Heg: Vy: AVA: y <a2(gy h) =(f y). 
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Let C’ = Clg := Ay: AAh: y < x.(fy)|, we can reduce the task of proving 
Cl(g T’ H)| = C’ to the task of proving C[(f T’)] = C’ using eqg-ind_r and Hez. 
If p is a proof of C[(f T’)| = C’ , then 


(eq-ind.r (B T’) (f T’) (An.C{n] = C’) p (g T’ H) (Heq T' HY) 


is a proof of C[(g T’ H)| = C’. We can repeat this process until there is no 
more instance of g in the left-hand side, and conclude using re fl_equal. Note the 
constraint about free variables in T’ and H that must also be free in C[(g T’ H)]. 
If this constraint is not respected the proof term will not be well typed. 


4.3. Handling Case Operators 


Bound variables are introduced when one uses the case operator of some induc- 
tive type. Intuitively, the case operators perform some form of pattern matching 
on values in an inductive type. This pattern matching makes several cases appear, 
corresponding to the various constructors of the inductive type. Most construc- 
tors carry values and the A-abstractions are used in the case operators to give 
access to these values, like pattern-matching rules in functional programming. 

For instance, let us consider a well-founded definition of a discrete logarithm 
function: 


(Flog 2) = 
(nat_case Ax: nat IT f : (Ty : nat.(y < x) > nat).nat 
Af : (Ty : nat.(y < 0) > nat).0 
An: nat.rAf : (Ty : nat.(y < (Sn)) > nat).(S (f (div2 (S n)) (th n))) 
log = (Recy fe Frog) 


This definition assumes we are given a function div2 that divides a number by 
2 and a theorem th : Vn : nat.(div2 n) < (Sn). The second and third arguments 
of nat_case describe the computation to perform in two different cases. This is 
where A-abstractions occur. The bound variable n represents the predecessor of 
the function argument in the case where this argument is non-zero. 

The idea for this part of our work is to follow the same structure of pattern 
matching, not to compute the value returned by the function, but to construct a 
different proof of equality in each case. The recursive analysis of F’ is decomposed 
into two phases. The first phase corresponds to the subterms of F that have a 
functional type, receiving a function as argument. The second part is the part 
where the function argument is already received. One switches from the first 
phase to the second when crossing a A-abstraction. 


First phase : sub-terms of F with a functional argument. The expression 
F has type Tx: Ay: Ay <a2—(B y))- (B 2). It is a function that can 
receive (at least) two arguments. We shall always assume that F is of the form 
At.F". 
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The expression F’ must be well typed in a context where x has type A and 
F’ must have the type: 


Ty: Ay<x—-(By)) > (B 2). 


In a more general setting, the problem is the following one: given a type en- 
vironment I", a term X well-typed in I’, and composed only of applications of 
constructors to other constructors and variables occurring in I’, given a variable 
f declared in I with type f : Ty: A.(B y), prove the following equality: 


Vg: (Ty: Ay<X > (By)).(Vy: AVA: y<X(gyh)=(f y))> 
(FY Ax: AAR sy < Xf y)) = (FP! Ar: AAR sy < X(g yh) 


F’ should be a function, but it would be too restrictive to assume that F” should 
be a A-abstraction. In fact we shall consider two cases: 


1. it is a A-abstraction, F’ = Ag.T, in this case we can switch to the second 
phase for T, after having added in the context g: Ty: Ay< X > (By) 
and H.g: Vy: AVh:y < X.(g yh) =(f y). If this yields a proof p, then 


Ag: Ty: Ay < X > (By)AHeg : Vy: AVA: y < X.(g yh) =(f y)p 


is the requested proof. 
2. it is built with a case expression, that is, F’ has the following form: 


F’ = (ty-casetl, ...&, V). 


where ft, 1), ..., J, are formed as described in section 4.1. Because of the 
type of F’, the term t should have the following form: 


t=Axr:ty.Ilg: (IIy: Ay < X[z] > (B y)).t’ 


where X[-] is a context such that X|V] = X and t’ is some arbitrary type. 
Note that X[-] may have no hole when V does not occur in X. 


Recall the terms 1; are such that 1; = Ax(;).b;, we can recursively address the 
problem of finding a proof for formulas: 


Vg: (Ty: ATIh:y < X[(cj ©1...%n,)]-(B y)). 
(Vy: AVA: y < X[(c 21...2n,))-(g y h) = (fF y)) > 
(b; Ay: AAR: y < X[(e £1... 2n,)](F y)) = (bi 9) 


now with the term X being replaced with X|(c; z1...2n,)], a new type 
environment I” modified so that the new variables 21, ..., Zn» have been 
added with type x; : t;,;. Let us suppose this yields k proofs called p;. We 
can then construct a proof for the whole equality with the following shape: 


(ty_case’ 
Ac: ty.Vg: (dy: Ay < X[z] > (B y)). 
(Vy: AVh:y < X[z].(g yh) =(F y)) => 
(ty-case tl,...4, cx rAy: AAR: y < X[z].(g y h)) = 
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___ (ty-case tl... x Ay: AAR: y < X[z]-(f y)) 
AX(1)-P1 


Az (4) -Pk 
V <g Heg) 


Second phase: sub-terms of F where g occurs. In the second phase, the 
variable g, corresponding to the function that may be called to represent recur- 
sive calls, has a special status, and we are looking at a sub-term of F where g 
may occur free. We want to provide the equality of this term with another term 
where all free occurrences of g are replaced by Ay: A.Ah.y < X.(f y). 

We reduce the problem to the following one: given a type environment I’, a 
term X well-typed in I’, and composed only of application of constructors to 
other constructors and variables occurring in I’, given three variables f, g, and 
Heq declared in I with types f : Ty: A(B y),g: Hy: Ah: y < X.(B y), 
and Heg: Vy: AVR: y < X(g y h) = (f y), given an expression C, well typed 
in ' where f does not occur, construct a proof for the equality: 


C=Clg:=Ay.Ahiy < X.(f y)I. 
We only consider the following cases: 


1. if C falls in the case described in section 4.2, then apply the corresponding 
method, 
2. if C = (g T H), compute recursively a proof p for 


T= T\g = AyAhiy < Xf y)I, 


then the following expression is a good candidate for the requested proof: 


(eq-ind AT [n: Al(g T H) =(f n) (Hea T H) 
T[g := Ay-Ah:y < X.(f y)] P) 


However, this expression may be untypable. The type of this expression is 
supposed to be (g T H) = (f T’), but this expression is well-typed if its 
type is well-typed. For an equality to be well-typed, both sides must have 
the same type. The type of (g T H) is (BT) and the type of (f T’) is (B T”). 
If these two expressions are not (t-convertible, our method will fail. 

3. if C = (C" g) and g does not occur in C’, then switch back to the first phase 
to find a proof of: 


Vg: UTy: Aly < X).(B y)).Wy : AVA: (y < X)(g yh) =(f y)) => 
(C’ g) =(C" Ay: AAR: y < X.(f y)), 


then apply this proof to the current g and Heg. 
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4. if C = (ty-case tl, ...l, V) and ty, l,,..., ly are described as in section 4.1, 
let us suppose we are recursively able to find a proof po of 


V=Vig:=AyAR: y < Xf y)| 


in the same context I’, for the same value X, and the same f, g, and Heg, 
and let us suppose we are recursively able to find k proofs p; for the following 
statements: 

b; = bilg = Ay.Ah: y < X}-(F y)] 


where Xj is X[V := (c; z1...2%)], with new typing contexts I; such that 
declarations have been added for each of the variables 1 : ti,1,..., n, : tin, 
(modulo a renaming of variables to make sure the new variables are distinct 
from variables already present in I"), and where the two variables g, Heg 
have their type changed in I; to 


g:Ay:Ay< X;> (By) Heqg: Ay: AAR: y < Xj.(g y h) = (fF y). 


Writing l; for the terms Ar) -bilg = Ay.Ah:y < Xi.(f y)], the term 
p = (ty-case’ 
Ag: tyNg: (Hy: Ay < X|V := 2] > (B y)). 
VHeqg : Vy: AVA: y < X[V :=a].(g yh) = (fF y). 
(ty_case t l...l, x) = (ty_case tl, ...U, x) 
AZ, 2 €41...-AEn,  tiny-Ag: (Ty: Ay < Xj (B y)). 
AHeq: (Vy: AVA: y < Xi => (fy) = (gy A))-pr 


At, 2 thy... AEn, 2 tkyn, Ag: Ty: Ay < Xi, > (B y)). 
AHeq: (Wy: AWA: y < X, => (fy) =(9 y h)) Pe 
V g Heq) 


is a proof of: 
(ty_case t ly...ly V) = (ty_case I,...1j, V) 
and 


(eq-ind_r ty 
V 
Ax : ty.(ty-case t ly...l, V) = (ty-case l,...1, x) 


P 
Vig := Ay.Ah:y < Xf y)} 
Po) 
is a proof of C = Clg := Ay.Ah: y < X.(f y)}. 


4.4 The Class of Acceptable Functions 


We put several restrictions on the form of functions for which our method will 
work. These restrictions can be summarized as follows: the function F’ must be 
well-typed and must only use its functional argument in two possible ways: 


12 A. Balaa and Y. Bertot 


1. The functional argument is fully applied 
2. The functional argument is itself passed as argument to a case construct, 
that is, a function of the form ty_case. 


In particular, this precludes situations where F would pass its functional argu- 
ment to an auxiliary function. Attempting to summarize these constraints as a 
formal language description we have the following result: 


TT’ =dAr:T.Ti 
Ti =Af : (Hy: T-Hh: (Ry 2).T).Tag | (ty-case Axa). --- Atay-Ti T) 
Tag =(Th f) | (F T2,¢ Tar) | (T2,5 Ta,p) | Av: T.Ta,s | 
(ty-case ArA)-Ta,f vee Az(K)-Ta,f T2,7) 


The notation 72, is used to stress the fact that the functional argument must 
really be used in F in a more restrictive manner than any other A-term. In 
particular, no instance of 72, may be f, and f may not occur free in J; in the 


case (T] f) of Ta,. 


5 Usage in the Coq System 


In the Coq system, the type Acc is defined exactly as it is described in this paper. 
The function Rec, is represented by a function named well_founded_definition. 
This function takes as parameters the input type (corresponding to A throughout 
this paper), the relation on this type (<), the theorem stating that this relation 
is well-founded (wf), a function mapping elements of the input type to sets 
(B), and a function (F). 

This method to define well-founded recursive function is specially supported 
in the syntax of the Program tactic, that makes it possible to describe an algo- 
rithm without giving a complete justification that it is well defined, the necessary 
proofs being left as side conditions. Because the extraction mechanism provided 
by Coq uses the sort mechanism to delete proof arguments in functions, well- 
founded recursive function are naturally extracted as recursive functions that 
can be used to compute in Caml or a lazy programming language. 

Programmers in Coq do not use the ty_case operators to write down their 
algorithms. Instead, the Coq system provides a pattern matching construct that 
is equivalent to ty_case or ty-case’ constructs, the distinction being done auto- 
matically depending on the type of the result. An advantage of this approach is 
that the same pattern matching construct is used to describe the function and 
the proof. A drawback is that a single pattern matching construct may expand 
into a large number of ty_case constructs if the patterns occurring on the left 
hand side of pattern-matching rules are deep. 

Also, pattern-matching constructs can be much simpler to write than ty_case 
constructs when the types of all cases are equal. In this case, the user does not 
need to provide this type, which is computed automatically by the proof system 
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from the type of the first branch. In this case, we will say that the pattern- 
matching construct becomes simply typed. 

We have used our method to construct by hand fixpoint equalities for a small 
variety of functions including Ackermann’s function, well-founded encodings of 
factorial, remainder, a list-based quicksort, and some auxiliary function working 
on lists for a Java byte-code verifier. In all the cases treated so far, the functions 
defined by well-founded recursion had a non-dependent type, so that the problem 
raised in the second phase (case 2) did not occur. 

Based on our method, we are constructing a procedure to generate the equa- 
lity automatically when it is possible. This procedure will be provided in Coq 
with the following syntax: 


Wf_Definition f A B order th_wf F. 


Where f is the name of the new recursive function to be defined, A is the input 
type, B is the function that computes the output type from the input value, order 
is a binary relation on A, th_wf a proof that order is well-founded, and F the 
value of the functional that describes the function being defined (the functional 
of which f will be the fix-point, so to say). This command defines the function 
f and the theorem f_fxp_eqn whose statement is : 


Va: A(f xz) =(F x Ay: AAR: (order y z).(f y)) 
For instance, here is how our tool is to be used to define the log function. 


Section define_ren. 
Parameters div2 : nat -> nat; th : (m:nat)(1lt (div2 n) (S n)). 


Wf_Definition log nat [_:nat]nat 1t lt_wf 
(x:nat]<[x:nat] (f:(y:nat)(1t y x)->nat)nat>Cases x of 
0 => [£:7] (0) 
| (Sn) => [£:7](S (£ (div2 (S n)) (th n))) 
end. 


The fixpoint equation we obtain is: 


log_fxp_eqn : 
(x:nat) (log x) 
=(<[x0:nat] ((y:nat) (1t y x0)->nat)->nat> 

Cases x of 

O => [_:(Cy:nat)(1t y (0))->nat)] (0) 

\(S n) => 
[f:((y:nat)(1t y (S n))->nat)](S (f (div2 (S n))(th n))) 

end [y:mat; _:(1t y x)] (log y)) 


But it is easy to obtain a simpler formulation, where dependently typed case 
expressions become simply typed: 
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Theorem log_simple_equation : 
(x:nat) (log x) = 
(Cases x of 0 => (0) | (S n) => (S (log (div2 (S n)))) end). 
Intros x;Rewrite log fxp_eqn;Case x;Simpl; Auto. 
Qed. 


6 Conclusion 


The practical result of this work is a tool to generate an equation that is usually 
difficult to obtain. In its simplest form, the equation can be as simple as the 
function definitions that can be obtained with the Program tactic, where proof 
information also disappears [10]. 

The desire to produce proofs of fix-point equalities without using extensio- 
nality looks very much like a theoretic rather than pragmatic question. As mere 
users of a type-theory based proof system, we do not know how well or how badly 
the axiom of extensionality interferes with other aspects of the logic we use. The 
theoretical result brought by this paper is that we do not need to answer this 
question for a reasonable class of functions. 

In future work, we would like to study how this method can be extended 
to handle functions containing recursion operators and mutual recursion. We 
also want to provide a simplified version of the fix-point equation where proof 
arguments do not appear, following the example of log_simple_equation. This 
improvement can be implemented by a simple partial evaluation. It should also 
be possible to re-use the results of Parent [10] and Slind [15] towards the support 
of plain functional programming in proof systems. 
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A Proof of the Transfer Theorem 


This text can be checked mechanically using Coq Version 6.3.1. 


Scheme Acc_ind2 := Induction for Acc Sort Prop. 
Transparent well_founded_induction. 
Variables A:Set; R:A -> A -> Prop; 


R_wf:(well_founded A R); B:A -> Set; 
F:(x : A) (Cy : A) (Ry x) -> (B y)) -> (B x). 


Definition f := (well_founded_induction A R R_wf B F). 


Inductive Trans : A -> A -> Prop := 


Tce1: (x, y : A) (Rx y) -> (Trans x y) 


| Tc2: (x, y, z : A) (Rx y) -> CTrans y z) -> (Trans x z). 
Hints Resolve Tc1 Tc2 ex_intro. 
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Theorem trans_balance: 
(x, y, z2 : A) (Rx y) -> (Trans y z) 
-> (Ex [t : A] (Trans x t) /\ (Rt z)). 
Intros x y z H HO; Generalize x H; Clear H x; Elim HO;EAuto. 
Intros x yO zO H Hi H2 xO H3; Elim (H2 x); Auto. 
Intros x1 H4; Exists x1;Elim H4;EAuto. Qed. 


Theorem well_founded_ind2: 
(P : A -> Prop) 

((x : A)CCy : A) (Trans y x) -> (P y)) -> (P x)) -> (a : A)(P a). 
Intros P H a; Apply H; Cut ((y : A)(Trans y a) -> (P y)) /\ (P a). 
Intros HO; Elim HO; Auto. 

Elim a using (well_founded_ind A R R_wf). 

Intros x HO; Split. Intros y H1; Inversion Hi. Elim (HO y); Auto. 
Elim trans_balance with 1 := H2 2 := H3. 

Intros xi H7; Elim H7; Intros H8 H9. Elim (HO x1); Auto. 

Apply H; Intros y Hi; Generalize HO; Inversion H1. 

Intros H5; Elim (H5 y); Auto. 

Elim trans_balance with 1 := H2 2 := H3. 

Intros x1 H6 H7; Elim H6; Intros H8 H9. 

Elim H7 with y := x1; Auto. (Qed. 


Theorem is_trans: 
(x, y, 2: A) (Trans x y) -> (Ry z) -> (Trans x z). 
Induction 1; EAuto. 
Qed. 
Hints Resolve is_trans. 


Theorem wf_transfer: 
(tx : AD Cf’? : Gy: A (CBy)) 
(g : (Cy: A) (Ry x) -> ( y)) 
(Cy : A) (h: (Ry x)) (gy D = (f£’ y)) 
-> (F x fy: A] [h: (Ry x)] (gy h)) = 
(F x fy: A] [.: (Ry x)] (f£’ y))) 

—> (x: A) (Cf x) = (F x [y: A) (h: (Ry x)] (Cf y)). 
Unfold f; Intros H x; Elim x using well_founded_ind2; Intros x’. 
Unfold 3 well_founded_induction. 

Elim (R_wf x’) using Acc_ind2; Simpl. 
Intros x’’ accessible H_rec_acc H_rec_well_founded. 
Apply H with 
ge := [fy : A] [h: (Ry x’’)] 
(Acc_rec ARB [x0 : A] 
[. : (yt: A) (R yl x0) -> (Acc AR y1)] 
[g’? : (yi : A) (R yt x0) -> (B y1)J 
(F xO g’) y (accessible y h)). 
Intros y h; Rewrite H_rec_well_founded; EAuto. Qed. 
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Abstract. This article describes a set of derived inference rules and an 
abstract reduction machine using them that allow the implementation 
of an interpreter for HOL terms, with the same complexity as with ML 
code. The latter fact allows us to use HOL as a computer algebra system 
in which the user can implement algorithms, provided he proved them 
correct. 


1 Introduction 


This article describes another step towards using HOL [10] as a programming 
language. In order to preserve consistency, most logical systems forbid non- 
terminating functions. In HOL, library bossLib provides an automated and uni- 
form mechanism for the definition of recursive functions. Termination proofs are 
handled by another library, tf1 [15]. What we provide here is a tool to evaluate 
HOL expressions seen as programs of an ML-like language. 

The ability to compute normal form of expressions efficiently has many ap- 
plications. One of the most obvious is to solve numerical equations that only 
require calculations. For instance, we would not say that proving 1+2 = 3 needs 
reasoning. We can simply assign an algorithm to + that performs addition. Then, 
the above theorem is proven simply by running the algorithm with arguments 1 
and 2, which shall return 3. 

In some logical formalisms, this notion of computation is even considered 
essential. This is particularly obvious with type theoretic systems, such as Pure 
Type Systems [2,3], where the conversion rule asserts that two statements equal 
with respect to G-conversion have the same proofs. As a consequence, 1+ 2 = 3 
would be proven by reflexivity (assuming -conversion is extended with rules to 
compute with natural numbers). 

However, in HOL, only a-convertible terms are identified and, following LCF’s 
idea {11], every theorem must be produced by using the basic inference rules, 
such as reflexivity of equality or Modus Ponens. The goal is to build the theorem 
fF 1+2 = 3 (an equality derivation) given the expression 1 + 2. According to 
LCF’s terminology, this is a conversion [13,14]. 
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Let us illustrate our idea of computing within the logic on a very simple 
example. Assuming we represent natural numbers with 0 and a successor function 
S, we can use the following theorems to compute with addition: 


0+m=m S(n)+m= S(n+m) 


Each of them corresponds to one step of simplification. Using transitivity of 
equality, we can build simplification paths of arbitrary length; the goal we set is 
to build a reduction path from the original term to its normal form. If m and 
n are two canonical numbers (built only upon 0 and S$), then these equations 
allow us to compute the canonical representation of the sum of m and n. This 
is done of course the same way as running the following ML program: 


datatype num = 0 | S of nun; 
fun O +m=f 

| (Sn) +m = S(ntm); 
> val + = fn : num * num -> num 


The idea is to interpret equations as clauses defining an ML program, and we 
mimic the way ML programs are evaluated. Put in another way, we follow the 
algorithm of an ML interpreter, the term to be reduced being equivalent to the 
state of this interpreter. 

An essential point is to produce proofs of such theorems as efficiently as pos- 
sible. We will show that this leads to new proof techniques, where computation is 
not only used to evaluate numerical expressions, but is applied to symbolic com- 
putations. This establishes a connection between theorem proving and computer 
algebra systems. 

A last motivation we give here for computing within the logic is testing the 
specifications. One can check on some examples that what is specified actually 
behaves as intended when formalized. As a simple example, one can define EVEN 
with 


EVENO=T  EVEN1=F EVEN (S(S(n)))=EVEN n 


One can then try to run the specification and check that EVEN 127 computes 
to F, etc. It may be useful in the case of more elaborate specification. 

The structure of the paper is the following: we first comment why the existing 
tools of HOL are unable to solve our problem in the general case. Then, we 
introduce abstract machines as a way of implementing reduction functions for 
pure A-calculus and then A-calculus with constants. Section 4 shows how to 
tailor derived inference rules in order to make the reduction function return 
a derivation instead of the mere normal form. Summarizing all this leads to 
the abstract machine we implemented in HOL (section 5). We finally test our 
conversion on examples of various sizes and remark it has the same asymptotic 
complexity as ML code. 
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2 Comments on the Strategy 


The task discussed above can be done by already existing tools, such as con- 
version REWRITE_CONV. The latter, given a list of theorems, repeatedly performs 
rewrites on a term, until this is not possible anymore. Free variables may be 
instantiated so that the left hand side of a theorem becomes equal to the current 
term. For more details, see [13]. There are several other conversions (the most 
powerful being simpLib) that can perform higher order matching or conditio- 
nal rewriting. However, using these conversions for the evaluation of expressions 
may fail for various reasons. This section details why they do not work pro- 
perly, and how we suggest to fix these problems. Let us first recall that officially, 
HOL terms are A-terms with constants. This means terms are either variables, 
constants, application or abstraction: 


Term := «|c|T, To |Av.T 


Bottom-Up Evaluation 


Since a term may contain several redexes, we have to choose a strategy determi- 
ning the order following which they are reduced. 

HOL’s rewriters use a top-down rewriting strategy. That is, they first try 
to make simplifications at the top of the term. When all simplifications at the 
top have been done, the sub-expressions are simplified recursively. Since this 
can create redexes at the top of the term, one then has to simplify the whole 
expression. 

This strategy is not bad for symbolic computations since it tries first to 
simplify the largest expressions, which may make subterms disappear without 
wasting time reducing them. The drawback of this strategy is that it applies 
the same simplifications several times at the same subterm. Moreover, some 
simplification rules may duplicate some arguments (and all the redexes in them). 
Therefore, we may make the same computation several times, as in a call-by-need 
strategy. 

The idea of computation is rather to evaluate with a bottom-up strategy, 
putting first the sub-expressions in a given canonical form (its value), which 
creates redexes at the upper level. This is closer to the situation of programming 
languages such as ML, which implement a call-by-value strategy. 

We exclude lazy strategies as they are generally implemented with side effects. 
And since the primitive rules of HOL are purely applicative, we cannot reuse all 
the work done in compilation of lazy languages. 


Laziness for Conditional and Elimination Constants 


Another problem with the strategy of HOL’s casual rewriting strategies appears 
with the conditional, which is defined by the clauses 


b CONDT ¢, to = ty and Ee CONDF t, tg = tg. 
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The condition and both branches are all simplified, and then depending on 
the value of the boolean one or the other branch is dropped. This is particularly 
inefficient in the case of recursive functions. It can even make the simplification 
process loop. 

As an example, consider the power function. It can be defined by primitive 
recursion over the exponent, and we can use the following theorem as a definition: 


pow x n = if n = 0 then 1 else x * pow x (n-1) 


Let us look at the successive steps REWRITE_CONV (and actually any simple 
bottom-up strategy) would do, omitting uninteresting steps: 


pow 10 0 
if 0 = 0 then 1 else 10 * pow 10 (0-1) 
if T then 1 else 10 * if (0-1) = 0 then 1 else pow 10 ((0-1)-1) 


In the first step, we simply use the definition of power. This produces a con- 
ditional whose first argument is neither T nor F, thus a top-down strategy will 
simplify all three arguments before trying to simplify again the conditional. In 
the second step, we can see that the condition reduces to T, but we still have 
to simplify the other two arguments. Furthermore, the else branch contains a 
recursive call, and the definition of power can be expanded again. It is obvious 
this will never terminate. 

HOL’s reduceLib library was designed to (among others) avoid this problem 
and allow to compute efficiently usual boolean and numerical expressions. Ho- 
wever, this mechanism is not general in the sense that it cannot deal with other 
user-defined constants, and the same effort will have to be done whenever we 
introduce constants such as datatype destructors. Consider for instance the op- 
tion type. In the expression option_case e f (SOME x), which reduces tof x, 
we would like to avoid computing the canonical form of e (the case for NONE). 
Again, this may dramatically slow down the evaluation process, and even make 
it loop. 

Thus, we definitely want to avoid wiring in the simplifier special handling for 
a given constant. On the other hand, we prefer providing a more general feature, 
where the user has a simple way of declaring which arguments of a constant must 
be evaluated lazily. Most ML programmers will feel more familiar with this kind 
of analysis than “hacking” his own conversion. This contradicts our first intention 
of implementing a call-by-value strategy. Thus, we make a distinction between 
reductions associated to a constant done by pattern-matching, where arguments 
need to be evaluated eagerly, and G-reduction, which we will reduce with a call- 
by-need strategy. Thanks to that, one can control eagerness of evaluation by 
choosing one or the other of these two equivalent forms: 


-FM2x=N & +}Me=d2.N 


where x does not appear free in M. In the first case, the argument of M will be 
evaluated eagerly, while in the second one, M is reduced on the spot, returning 
a A-abstraction that will evaluate its argument only if needed. 
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In the case of the conditional, the following reduction specification is to be 
preferred to the one given earlier: 


- CONDT = Afy. Ato. ty + CONDF = Af. Ata. te 


Weak Reduction 


A weak reduction (no reduction under abstractions) is more desirable. As the 
example below shows, evaluating the body of an abstraction may have serious 
effects on efficiency: 


fun pow2 0 = 1 
| pow2 n = let val y = pow2 (n-1) in yty end; 


The partially evaluated form has terrible performance, since the time complexity 
of the algorithm evaluated with a call-by-value strategy becomes exponential’. 


fun pow2 0 = 1 
| pow2 n = (pow2 (n-1))+(pow2 (n-1)); 


We prefer let the user decide which form is the most efficient and then inter- 
pret a function under the form provided. Experienced users of strict functional 
languages will make this optimization by hand very easily. 

On the other hand, it would be annoying not being able to compute an 
expression just because it is hidden by, say, an universal quantification. So, we 
implement a strong reduction, but the latter is used only when necessary, that 
is, when we know the abstraction will not be applied any more. 

All these remarks will be be summarized when describing the transitions of 
the abstract machine. Now, we explain how a reduction function can be imple- 
mented by an abstract machine and adapted to produce a theorem. 


3 Abstract Machines 


In this section, we describe how normalization functions can be implemented 
with abstract reduction machines, such as Krivine’s abstract machine (KAM, 
see [7]). To describe an abstract machine we give ourselves a set of states and a 
transition relation between these states. Then, the normalization process is the 
following: 


4285 Be nie) 


Given a term t, we inject it into an initial state. Applying all transitions possible 
on this state, the abstract machine reaches a final state S;, and a projection 
method (not necessarily the inverse of injection) gives back a term, which should 
be the normal form of t. 

There are several good properties we can expect from an abstract machine. 
Firstly, if computing the next step is an atomic operation (in the sense that 


1 With a call-by-need strategy, both are exponential. 
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its execution time can be bounded independently of the size of its input), then 
studying the number of steps to the normal form gives us the time complexity 
of the interpreter”. Secondly, it can be implemented by a tail-recursive function, 
which ensures that the size of the input it can deal with is limited only by the 
size of the process, and not the fixed (small) size allocated to the execution stack. 


Terms with Focus 


We introduce terms with focus. The focus is a subterm we can mutate easily. We 
will write the focused term between square brackets (e.g. (Ax. [f z]) y), and C(t) 
simply denotes a term which admits t as a subterm. We can consider several 
operations on these terms with focus (we will consider other operations later 
on): 


Zoom in: Cla b] — C([a] 6) 
Zoom out: C({a] b) —> Cfa d] 
Beta: C[(Ax.m) b| —> Clm{b/z}] 


These operations are of two kinds: Beta actually reduces the term, whereas Zoom 
in and out simply move the focus within the term. 
An easy algorithm to put a term in head normal form is the following: 


— if the focus is an application, then zoom in and resume; 
— if the focus is an abstraction, then zoom one step out, reduce and resume. 
— otherwise, zoom out completely and stop. 


The injection function is simply (t + [é]), and the projection is the last step 
above. The machine stops when a variable applied to arguments or a non-applied 
abstraction is found: this is a head normal form. 

Let us remark that Beta is not atomic, because of the substitution. We now 
show an abstract machine that solves this problem by using closures. 


Krivine’s Abstract Machine 


The KAM is a very simple machine that computes the weak head normal form of 
a pure A-terms (without constants), which means we only have to look for redexes 
in the left subterm of applications. Therefore, we can represent our terms with 
focus by a pair of the focused term and its arguments: [t] ts can be represented 
by (¢, ts), which is the pair of a term t and a list of terms ts. 

If we want the steps to be atomic, we must be able to delay substitution, 
because of the @ step. Hence the idea of replacing terms with closures, a pair of 
an environment e and a term t. The environment itself is a list of closures®, and 


? This requirement is not necessary: this is just a methodological constraint that urges 
the designer to split the expensive transitions and think how he can factorize them, 
but does not solve the problem per se. 

3 As in the literature, we use the de Bruijn indices [9] for bound variables. A name- 
carrying presentation might have been given, in which case an environment would 
be a mapping from variables names to closures, see [8]. 
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maps any de Bruijn index into a closure, acting as a delayed substitution, which 
is propagated as the code (the term) is executed. 
The transitions of the KAM are given below: 


(e,ab, S) —> (e, a, (e,b)::S) 
(e, Am., C::S) —+ (e.C,m, S) 
(e.(e/,t), 0,8) —> (et, 8) 
(e.C, (ai, S) — (e,n, S) 


KAM 


There are three kinds of transitions: the first rule moves the focus in search 
of a new redex, the second one actually reduces the term, and the other ones 
propagate substitutions. 


Extension to Strong Reduction 


Crégut [7] extends this machine to KN, to perform strong reductions. But this 
is still a call-by-need strategy. If we want to reduce the argument of a constant, 
we must be able to focus on it. Hence we cannot represent our terms with focus 
the same way. Instead of a list of terms, the second component of the state will 
be a stack, following the syntax 


S$ :=[]| @z(t).S | @p(t).S | A(x).8 


where ¢ is a term and z a variable. 

It describes a path in the term, from the occurrence to the top, keeping track 
of the other sons of a node. The first constructor means we are at the top of the 
term. The next two are used when going under an application (respectively left 
and right), and the last one to cross abstraction and do strong reductions. 

For instance, term (Az. [f x]) y would be represented by (f z, A(r).@z(y).[]). 


Computing with Constants (Pattern-Matching) 


The KAM deals with G-reduction. But we also want to rewrite theorems defining 
the computational behavior of the constants. The simplest kind of rewrites we 
can use is a mere definition, i.e. a theorem showing that a constant c is equal 
to its definition. We interpret a theorem + c = M as introducing the reduction 
rule cp M. 

Mathematicians and ML programmers are used to define functions using 
patterns, assigning the result for all the inputs that match these patterns. A 
function is generally totally defined by giving a finite set of clauses. The example 
of addition given in the introduction is an illustration of that way of defining 
functions by pattern-matching. For instance, a theorem k Vz.c « = M will be 
interpreted as rule c Np M{N/z}. It differs slightly from cp Ax. M because, 
in the former case, reduction can occur only when c is applied to at least one 
argument. 

Furthermore, one can define a function whose domain is a recursive type 
by giving a clause for every constructor. For instance, a function over naturals 
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is totally and uniquely defined if by giving two clauses: one for 0 thanks to a 
theorem of the form | c 0 = P and another one for the successor case with a 
theorem like k c (Sn) = Q. 

We see that we need to guess how to instantiate the free variable of the left 
hand side of the equation we rewrite. This operation is called pattern-matching. 
Not any form should be allowed for the left hand sides (the pattern). Since our 
intent is to emulate a language like ML, we adopt the same kind of restrictions 
than ML patterns. That is, they should follow a restricted syntax: 


Pattern := z|c pi...Dn where pj,... ;Pn € Pattern. 


Left hand sides must be patterns not reduced to a variable. Note that unlike ML 
we do not restrict to linear patterns. The same variable may occur in the pattern, 
and matching succeeds if the corresponding terms are a-equivalents*. There is 
no restriction on the right hand side. However, free variables not occurring in 
the pattern will not be instantiated. 

We have to extend the transition rules of the KAM in order to take constants 
into account. As usual, arguments of applications are stored in the stack. When 
a constant is found, we look in the simplification database whether the term can 
be rewritten on the spot. If it can be simplified, we resume reduction on the 
residue. Otherwise, arguments in the stack are (weakly) reduced one by one and 
applied to the constant. Every time an argument is applied, we check out if it 
can be rewritten by a theorem of the database. 


4 The Inference Rules 


If we want to be able to build a theorem corresponding to any reduction path, 
we could use rules that correspond to the usual definition: Gn-reduction is the 
smallest reflexive, transitive and congruent (with respect to application and ab- 
straction) containing the rules 3 and 7. Thus, the rules REFL, TRANS (reflexivity 
and transitivity of equality), MK-COMB, ABS (equality is congruent with respect 
to application and abstraction), BETA-CONV and ETA_CONV (one step (6 and 7 
conversions) are enough to fulfill our goal. 

However, they are not the best choice of atomic rules to build a derivation, 
because they perform useless operations such as type-checking and a-conversion 
tests: the one-step rules BETA_CONV, ETA_CONV and REFL compute the type of their 
input term and TRANS performs an a-conversion test. In principle, we should be 
able to avoid completely type-checking because of the subject-reduction property 
(the reduct of a well-typed term is a well-typed term with same type). The idea 
to avoid these redundancies is to start from the empty reduction path (yielded 
by reflexivity) and then the rules simply add steps to the current path or access 
to the subterms. According to this, transitivity would not be necessary any more, 


4 But since we only perform weak reduction on the arguments of a constant, we may 
miss some simplifications. Ax. (Ay. y) x and Ax. x would not be identified 
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but we keep it to extend the current path with an arbitrary theorem in order to 
do rewrites regarding constants, as for the addition example of the introduction. 
In figure 1, we propose a new set of inference rules more convenient for 
building derivations. Let us first explain the unusual form of rules MK_COMB 
and MK_ABS. Rule MK_COMB can be read in another way, similar to a tactic: 


Frt=ab 


me I;'a=a’ Ig b=0' 
eee! See Be 


That is, given a derivation ending on an application (the goal), rule MK.COMB 
returns two empty derivations starting from the subterms (these are the sub- 
goals), and a rule R; that completes the initial derivation from the completed 
derivations of the subterms (this is the validation). The presentation of the figure 
respects the shape of the final derivation and the ellipses are the parts still to 
be built. 


Tet =t2 I’ ton te = ts 
TUL’ t =ts 
Fa=a Fkb=b 


(TRANS) 


ret=ab Tybka=a’ Ighb=0' 
PUM UMett=a' b’ 


-m=m 


(MK_cOMB) 


TKt=dXa.m I't}m=m’ 
PUI’ tt=X2.m’ 


(Mk-_aBs) (x ¢ FV(I")) 
[Et=(Ag.m) b Frt=dAr.mez 
Petenbey. (n) Reign (z g FV(m)) 


ret=t 


ey) TFuo.t=0 


ew (ACCEPT) 


Fig. 1. Derived rules used by the conversion 


An important remark is that we introduced a new ML datatype to represent 
derivations. It is close to HOL’s theorems. Notation | is used to represent our 
theorems and fyoz for HOL’s. This new datatype is also abstract and values 
can be built only using a small set of inference rules (those of fig. 1). However, 
the goal is to build a HOL theorem in the end (the projection step of abstract 
machines), hence the ACCEPT rule. Apart from these rules, we only need a 
function to access the right hand side of a theorem. The latter represents the 
end of the current reduction path and is used to decide which is the next step 
to perform. 
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Note that the rules are trivially derivable in HOL’s logic, so that - can be 
implemented by Fyot. As a first step, one can forget this distinction. However, 
we will show that for efficiency reasons, it may be useful to consider alternative 
representations for |. For instance, we already noticed that the transitivity rule 
of HOL performs an a-conversion test that would make this rule non-atomic. We 
achieved making these rules atomic for -yo, by implementing several derived 
rules more efficiently, and changing the internal representation of terms. 


Using Pointer Equality 


The “validations” of rules MK-COMB and MK-ABs have to check their inputs to 
be reduction paths starting from the subterms. Usual HOL rules would test for 
a-convertibility, which ruins the constant time requirement. Instead of this, we 
test for pointer equality. 


Terms with Explicit Substitutions 


-reduction has to perform a substitution, which takes as much time as the 
size of the function body. We proposed to implement A-terms using explicit 
substitutions. However, they are not mentioned in the interface: terms are still 
used as simple \-terms with constants. This is essential for compatibility reasons, 
and because we consider delaying substitution more as a requirement for the 
efficiency of the implementation than as a fundamental construction (one can 
already delay atomic substitution thanks to the LET constant). 

In our proposition (implemented in the last release of HOL), the internal re- 
presentation of terms uses de Bruijn indices (as before) and explicit substituti- 
ons. A term is either a bound variable, a free variable, a constant, an application, 
an abstraction or a closure: 


t: 
Env: 


I 


n|ax|c|ty to|Aa.t| felt 
id|e.t where e € Env 


Abstractions bear a variable to record a printable name, and also give a type 
for the domain. Equality modulo substitution propagation is called 0. Terms are 
therefore identified modulo Go-conversion. 

We provide a new (-reduction function that uses explicit substitutions (it 
simply issues an explicit substitution instead of actually performing it). The 
propagation is performed lazily as the term is decomposed, but it is also possible 
to force all the substitutions in a term. This modification has no cost for those 
who do not use these new functions. 

We are not going to describe in detail the implementation of explicit sub- 
stitutions. As a hint, one can think substitutions as either the identity, or the 
parallel composition of a substitution with a new binding, as the above definition 
suggests. The de Bruijn representation does require other constructors to deal 
with lifting of indices. 
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Technically, let us just mention that it implements a particular strategy of 
Ag [1] that avoids non termination (see [12}) by eagerly doing substitution com- 
position (thus we do not need any explicit composition operator in the syntax 
of substitutions). 

A relevant point is that it allows a G-reduction in constant time, for it simply 
introduces a substitution operator. As a further benefit, substitutions can be 
composed. The point of substitution composition appears, when reducing a term 
with many ({-redexes, the substitution is done in the body only when all the 
arguments have been provided. For instance, in the following reduction path, 
each steps takes constant time: 


F (Aryz.m)abe 
F (fa]lAyz.m) be 
F ([a.b]Az.m) c 
F [a.b.c]m 


And then we can propagate the compound (parallel) substitution in m. The 
usual approach would yield three successive substitutions in m. But there is 
even worse than that: the first step would substitute a for x in m. Then, in 
the second step, we would substitute b for y, including in the occurrences of a 
that appeared in m during the first step. This is useless since we know y cannot 
occur in a. As we will make more precise later on, this has a major impact on 
the complexity. 

Note that an n-ary G-reduction step would still not solve the problem since 
the inner redexes may result from arbitrary complex reductions, as in 


(Az.id (Ayz.m)) abe 


where id is the identity function. As a first step, only the redex on x can be 
reduced. Then, redexes on y and z appear from the reduction of id. 


5 The Abstract Machine of computeLib 


Description of the Machine 


The state of the machine is a “theorem with focus”, that is the pair of a theorem 
and a stack. The difference with the datatype of stack presented in section 3 is 
that it contains theorems instead of terms. 
The injection simply maps a term into the empty derivation (thanks to the 
REFL rule): 
tHe (Ft=t, [}) 


Here, we assume the transitions (to be described in the next subsections) 
will eventually lead to a state where the stack is [] again, so that the projection 
step is the first projection. As we said when discussing the strategy, we favor 
weak reductions. This is achieved by having two sets of transitions W and S. The 
first one only does weak reductions, and the second one makes strong reductions 
whenever no weak reductions are possible. 
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Convention of the Transition Rules 


One crucial point for an efficient implementation of a call-by-value strategy is 
that values appearing in environments are already evaluated. Hence, we need to 
make a clear distinction between [t]}0 where we know t is already fully evaluated, 
and [id]é where ¢ may still contain redexes. 

The idea is to have a convention such that terms without closures represent 
evaluated forms and [e]t denotes the unevaluated program t in environment e. 
Thus, the exact injection function is: 


Q 
nh 


e 


inj = (t+ (F [id]t = [id]t, [])) 
We can define strong and weak normal forms and closures as subsets of terms: 
Snf :=nv|xvfiev{az.u 


Wnf :=nwi|xw|cw | felAr.m 
Clos :=nw|lxwliecw|daz.v| [elt 


lI 


with v (resp. w) a list of elements of Snf (resp. Wnf), t contains no substitution, 
and in Snf and Wnf, no rules matches c v or c w. 

An invariant of the machine will be that any term appearing in the state 
belongs to Clos and has no unbound de Bruijn indice (the closure operator 
being a binder). This is true for the initial state, and every transition preserves 
this invariant. 

But since explicit substitutions where hidden, the term manipulation pri- 
mitives only allow us to access to its fully substituted form t, and we cannot 
discriminate the two states above. In the implementation, every theorem carries 
an external information about the actual form of the right hand side. For every 
transformation we apply to theorems, we are able to do the same on the annot- 
ation, so that it is always consistent. Thanks to that, we can do as if we could 
access the internal representation. This duplicates the operations, but improves 
considerably portability since it makes less assumption on the exact datatype of 
the theorem prover. 

As an informal notation, we will write theorems using the internal syntax of 
terms (section 4). In fact, the datatype used in the implementation is not exactly 
the same. For instance, it has a n-ary representation of application, to access 
faster to the head of an application, and constant are annotated with the set of 
rewrites that apply to it, avoiding lookups during the reduction. 


Transitions for Weak Reduction 


The transitions in figure 2 are adapted from the KAM. Considering only right 
hand sides of theorems, the first four rules look very similar to the KAM. Note 
that in rule 2, theorem | t’ = [e.u’]m is obtained by applying R; to theorems 
+t a = [e]Az.m and + b = 0’, which builds + t = ([e]Ar. m) 6’. Then, applying 
rule @ yields + t = [e.b'|m. 
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The next two rules show that substitutions operate only on de Bruijn indices. 
Rule 7 is the one that actually performs rewriting, since it replaces a term 
matching the left hand side of a rewrite with the right hand side instantiated. 
We define Match as a partial function mapping any term t to a rewrite of 3’ and 
a substitution that identifies t with the left hand side of the equation: 


Matchy(t) & (Kl=r,e) if Kl=reZ A [ell=act 


This matching side-condition can be considered atomic since its time complexity 
only depends on the size of the rewrite set, but not on the size of the term to be 
reduced. Proposition NoMatchy(t) is true whenever the above partial function 
has no result. 


The next two rules start reducing the next argument of a partially applied 
variable or constant, when no reduction is possible. This is achieved by simply 
swapping the head and the argument (and @, becomes @z). The last rule applies 
when this argument has been weakly normalized, in which case it is applied to 
the head appearing in the stack. One simply has to use R;. 


(F t= [el(a b), S) —> (Ft [ela = [ela, @z(F [e]b = [e]b, Rz).S) 
(a= felAr.m, Q@r(- b=0', Rt).S) —> (+ t = [e.b’]m, S) 
(Ft = [e.aj0, S) —+ (Ff t=a, S) 
(Ft = [e.ajn+1, S) —> (Ft=[e]n, S) 
(F t= [e]z, S) — (Ft=za, S) 
(Fk t=[ele, S) —> (Fk t=e, S) 
(Ft=cw, S) —> (Ft=[e]r, 5S) 
when Matchs(c w) = (Fl =r, e) 
(Fa=aw, @,(-b=0',R).S) — (Fb=0', @ag(/a=z w, R).S) 
(Fa=cw, @,(-b=0',R:).S) — (F-b=b', @x(-a=c w, R:).S) 
when NoMatchs(c w) 
(Fb=b', @ag(+a=a’',R,).S) —> (Ft=a’ v, S) when b’ € Wnf 


Fig. 2. Weak transitions of the machine (W) 


Several rules have side-conditions. Actually checking a condition like b’ € 
Wnf would make the step non-atomic because the complexity of this operation 
is at least linear with the size of b’. We avoid this costly operation by ordering 
the rules: we first apply the upper rules but the last one. If none of these rule 
apply, this means we have a weak normal form and the last rule of figure 2 can 
be applied without actually checking the side-condition. 


Applying all the rules of figure 2 leads to a state with an empty stack and 
where the right hand side of the theorem belongs to Wnf. 
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Transitions for Strong Reduction 


This subsection describes the strong normalization. The goal of this second ma- 
chine is to reduce redexes appearing under an abstraction. The first rule of 
figure 3 shows that we can always make weak reductions. Notation W( t = t’) 
stands for the first component (a theorem) of the state reached by applying all 
the transitions of W on (Ft =#’, [{]). 

Rule 2 starts reducing under an abstraction when the weak reduction retur- 
ned an unapplied abstraction. A fresh name 2’ has to be found to avoid captures. 
This side-condition is indeed costly and we could not find a way to avoid it. Note 
that this problem would not arise in a real de Bruijn setting. 

The next two rules start reducing in the head of an application by pushing 
the last argument in the stack. 

The other rules try to rebuild the normal form when the head term is nor- 
malized. We have to look at the stack: if we find A, the abstraction is rebuilt; for 
@z, we start normalizing the next argument on the stack; and for @z, we have 
normalized both subterms, so the application can be rebuilt. 

Let us explain how we avoid checking side-conditions (the problem of rule 2 
has already been explained): rule 1 is applied once every time a step of S may 
produce a term with redexes (this is the case only for rule 2); the last three rules 
are dealt with as for W, since applying repeatedly the other rules yields a state 
where the right hand side is a strong normal form. 

As with W, applying all the rules of figure 3 leads to a state with an empty 
stack and where the right hand side of the theorem belongs to Snf. This is the 


strong normal form. To summarize our conversion is simply the composition of 


def 
ACCEPT and S with the injection function: CBV_CONV = ACCEPT 0So inj. 


(Ft=t/, S) — (WiHt=?), S) when t ¢ Wnf 
(Ft = [e]Az.m, S) —> ( [e.z']m = [e.2’]m, A(z’, Rz).S) 
when 2’ ¢ FV((elAz. m) 
{(Ft=x2wb,S) —- (Frw=crvw, @,(+ b= b, R:).S) 
(kt=cwd, S) —> (kew=cw, @(-b=6,R)-S) 
(Em=m’, (x, R:).S) —> (F t= Ax. (m’'{0/zr}), S) when m’ € Snf 
(Fa=a’', @;(-b=0',R).S) — (Fb=0', @x(-}a=a’',R).S)  whena’ € Snf 
(F b=)’, @p(h a =a’, Rt).S) — (Ft=a' BU’, S) when 0’ € Snf 


Fig. 3. Strong reduction of the machine (S) 


6 Computing Within the Logic: Some Examples 


The machine described in the previous section was implemented in HOL as li- 
brary computeLib. It contains functions to create simplification sets from theo- 
rems, and a conversion CBV_CONV which normalizes its input term. 
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The first application of computeLib was to replace reduceLib. It was ac- 
tually straightforward to implement since we only had to collect the theorems 
used by reduceLib without taking care of the order in which rewrites should 
be applied®. Moreover, this new version appears twice as fast as the former 
reduceLib. 

Now we describe two other applications of computeLib. The first one tests 
our claim that we can program and compute within the logic as we do in ML. 
We chose sorting because this is a recurrent problem in algorithm complexity 
analysis. 


Merge Sort 


We can define all the functions involved in merge sort, and HOL manages to prove 
automatically all the termination proofs (they are primitive recursive functions). 
So, the development in HOL is a mere translation of the ML program. 

It is interesting to notice how computeLib interacts well with bossLib: star- 
ting from the initial simplification set (which defines the computational behavior 
of many the standard functions on booleans, numerals and lists), one simply has 
to add the theorems returned by bossLib and get a simplification set rws to use 
with CBV_CONV: 


CBV_CONV rws (--‘ merge_sort L38400 ‘--); 


where L38400 is a list of 38400 natural numbers between 0 and 4. 

We measured the execution times of this algorithm on lists of various length. 
We compared several execution models: using HOL’s rewriter, CBV_CONV, or ML. 
Indeed, we compared three instantiations of our abstract theorems. Figure 4 is a 
table reporting execution times. The description of the rewriters is the following: 


rewriter: using HOL’s rewriter REWRITE_CONV. Other simplifiers may be a bit 
faster, but the complexity is not better. 

without ES: atomic rules are implemented using usual HOL inference rules 
(without explicit substitutions and with redundant typing). 

with ES: the kernel is implemented with explicit substitutions. 

unsafe: the type of theorems is a mere triple (assumptions, lhs and rhs). 

Moscow ML: the algorithm is expressed directly in the implementation lan- 
guage. 


The first three rewriters are considered safe as they only use features provided 
by the current release of HOL98 (which includes explicit substitutions of section 4 
in the kernel). The fourth is said “unsafe” because the rules of | are implemented 
without checking the preconditions (for instance in the transitivity rule the right 


5 The division and modulo operations were implemented by a special conversion that 
resorts to an oracle that computes in ML the quotient and modulo, and then simply 
checks the result. We could reproduce this by providing the possibility to add a 
conversion to our simplification set. One simply has to specify the name and arity 
of the constant the conversion works for. 
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hand side of the first theorem and the left hand side of the second one are not 
checked to be equal). A bad usage of these rules would allow the derivation of 
a paradox. However, the conversion has so far shown to use them in a safe way, 
but this has not been proven yet. 


Size of the list | Rewriter | Without ES | With ES | Unsafe | Moscow ML 


0.043s 
1.578 
4.3s 


Fig. 4. Timing of several implementations of merge sort 


We can notice that explicit substitution play a crucial role: they make the 
complexity N.log N instead of N?. Comparing the safe version (with ES) with 
MoscowMLL, we have a slow down factor of 990 for a size of 1200 and 490 for size 
38400. Indeed, our implementation has a better complexity on huge examples, 
probably because memory allocation represent most of the overall time in the 
case of the execution in ML. 

With the unsafe version, more than half of the time is spent in the function 
that does matching and instantiation of theorems, precisely a part of code that 
still could be improved a lot. 

As a final remark, non tail recursive functions fail (due to a stack overflow) 
on the example of size 38400, because the number of recursive calls exhausts the 
memory space allocated for the execution stack With a tail recursive implemen- 
tation, the interpreter needs a constant amount of memory on the stack. Instead, 
the “stack” component of the state plays the role of the execution stack, but is 
allocated in the heap. 


Various Representations for Theorems 


We instantiated the type of theorems with several different implementations. 
This resulted in differences in efficiency. One can implement it with the classical 
HOL theorems, or a lazier variant that does theorem decompositions only when a 
step has actually been done. This is the approach of Boulton’s lazy theorems [5]. 

One can also try a more efficient representation of equations using a triple 
(assumptions, left hand side, right hand side), which allow an even more efficient 
implementation of the rules. But a problem arises when implementing ACCEPT. 
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One can either trust the implementation of these rules and ACCEPT would pro- 
duce a theorem without resorting to primitive rules, just as we produce theorems 
read from disk, or create a new tag that would mark all the theorems that rely 
on our implementation. 

On the other hand, one could imagine an even safer representation of theo- 
rems, that would carry an explicit representation of derivations. ACCEPT would 
just consist of dropping the derivation part. 


Application: A Decision Procedure on Polynomials 


We also developed a theory of rings, and adapted the tactic originally written 
by Boutin to normalize polynomial expressions [6] in Coq [4]. 

Let us assume we have a type a and a ring structure (i.e. 0,1,+,—,* with 
the expected properties of commutativity and associativity). 

It consists in defining a datatype a polynom representing polynomial expres- 
sions syntactically: 


Pol := Var(i) | Cst(v) | P; @ Po | Pi ® P| OP 


Having a syntactic representation of a polynomial expression in the logic 
allows us to write within the logic a function that can analyze the polynomial. 

We define an interpretation function that relates the syntactic polynomial to 

assigns a value of type a (the target ring) to every syntactic polynomial 
expression (of type a polynom) in a valuation p. 


[Var(i)]> = p(?) 
[Cst(c)], =e 
[Pi © Pol, = [Pi], + [Pa], 
[Pi ® Pol, = [Pi]p-[Pel, 
[SP] m -[Pl, 


Then, we write a HOL function F of type apolynom — apolynom that 
returns a simplified version of a syntactic polynomial given as argument. The 
simplifications performed are: 


— expanding and ordering of monomials in a canonical order, 
— erasing monomials with null coefficient. 


At last, we prove the correctness of our function, i.e. the interpretation of 
our syntactic polynomial by assigning values to every variable is preserved by 
our function. 

VP p. [Plo = [F (Pll 


This is true for any ring. This proof showed a need for tools to write theories 
parameterized by assumptions, such as sections in Coq or locales in Isabelle. 
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Our problem is now to find a polynomial P and a valuation p such that [P]], 
is equal to our initial expression. Such function analyses the head constructor 
of a HOL term: if it’s an addition, multiplication, or unary negation, then the 
corresponding syntactic operator is built and the respective subterms are trans- 
lated; there is also a predicate to recognize canonical elements of the ring; all 
the other expressions are abstracted, that is a new variable is created (unless 
this expression was already found), and p is updated so that it maps it to the 
corresponding expression. 

Deciding the equality of two expressions simply consists of applying the sim- 
plification procedure described above to the two hand sides, and comparing the 
resulting polynomials (we must find two polynomials in the same valuation, 
because the way we order monomials depends on the way the valuation is com- 
puted). 

Note that we cannot do the same conversion with real numbers, since equality 
is not decidable, which precludes the simplification of monomials with coefficient 
0. 

We tested this procedure on a quite large example: the 8 squares problem 
(stated in [6]). This is a result similar to the binomial equality (a + b)? = a? + 
2ab +b, but we have to simplify a polynomial consisting of the sum of 8 squares 
of polynomials, each one being the sum of 8 monomials. Once fully developed, 
we have a sum of 512 monomials, and we must sort it to merge and simplify 
monomials of same power. Using the method described above, it took 41s to 
solve. 

This problem could be solved using REWRITE_CONV, and a conversion dealing 
with associative-commutative theory thanks to a variable ordering. This a bit 
tricky to implement and our attempts resulted in execution times higher than 
with computeLib. The fundamental reason is that such rewriters will most of 
the time perform elementary permutations, swapping two adjacent monomials, 
which makes sorting quadratic. On the other hand, by abstracting the problem 
of sorting, the reflection technique allows us to use an algorithm with better 
complexity. 

Running the same example on a computer algebra system, say Maple, takes 
far less than a second, but the comparison is quite unfair: 


— We can trust the result®, since it has been produced only using a very small 
number of well-principled inference rules. 

— The algorithm we used is at the least straightforward: almost no effort was 
spent on improving the algorithm. We are of course limited by the fact 
we must produce an applicative algorithm. This precludes the use of hash- 
consing. 

— We compare a program compiled in native code (Maple) with an interpreter 
for the A-calculus, ran by the bytecode interpreter of Moscow ML, which is 
compiled in native code. A slow down factor of 100 would not be surprising 
only for these two levels of interpretation. 


® In Maple, sqrt(’-x’*2) and sqrt(’’-x’’*2) simplify to expressions of opposite 
signs! 
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7 Further Work 


There is still much space for improvements. It was designed to be simple so that 
we can test it, and modify it easily. The directions for further work are manifold. 


Improving the Implementation 


There are very few optimizations of datastructures. For instance, the database 
of rewrites does not resort to discrimination nets. It is a hash-table which keys 
are pairs of constant names and arity, and results are simply lists of rewrites. 


Adding More Features 


Here, we discuss how we can extend the class of theorems we are able to rewrite 
with: 


— Support conditional rewriting (i.e. equations with preconditions). This is 
equivalent to “when” clauses in Objective Caml. 
— Support an aliasing construction. In program 


fun f 0 =1 | f n = n+tn; 


the second branch cannot be read as an equation because it overlaps with 
the first one, which has higher priority. The theorem we get is rather + 
Vk.f (S(k)) = S(k) + S(k). It would be nice to avoid multiple reconstruction 
of a constructor using a kind of aliasing: 


F Vnk.(n = S(k)) ==> (fn =n+n) 


which can be read as the ML pattern branch f (n as S(k)) = ntn. This 
is not really conditional rewriting because the value of k is not constrained 
by the left hand side. 


The difficulty with these extensions is that they make pattern-matching and 
reduction itself mutually dependent. 


Ensure Termination and Canonicity 


In the introduction we said that the idea was to put arguments of defined fun- 
ctions in canonical form before reducing, but the system does not make any 
difference between these classes of constants. It could be handy to check for the 
user that he provided rewrites for all the possible inputs (exhaustive pattern- 
matching), ensuring that we will eventually reduce to a canonical expression. 

Similarly, we do not check termination of the set of rewrites. Adding theorem 
+ c=C trivially breaks termination. 
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Extraction 


Now we have the expected complexity, we could try to reduce the large constant 
factor by optimizing the datastructures, but we will never cut it down drastically. 
Instead of interpreting the program, we could try to compile it, for instance by 
translating our rewrites into an ML program. This way, we would reduce the 
number of interpretation levels. 

This scheme has several serious limitations. One is that CBV_CONV still accepts 
a too wide range of theorems. For instance, in ML, patterns must be linear, but 
we can deal with non-linear patterns. More problematic is the fact that we can 
assign to a polymorphic constant different algorithms depending on the way type 
variables are instantiated (we have a kind of overloading). This is not possible 
in ML since we do not have types at runtime. Finally, only closed ML programs 
(without free variable) can be executed. 


8 Conclusion 


We designed and implemented a rewriting strategy that allows the specification 
of algorithm in a style similar to a purely applicative ML, and furthermore 
provide an interpreter with the same asymptotic complexity as the corresponding 
ML code. It allows the evaluation of expressions without taking extra care, but 
it is also a qualitative improvement in the sense that computational problems 
of medium and large size could not be dealt with by REWRITE_CONV. We showed 
it was usable (and already used by several enthusiastic HOL developers!) by 
running a very classical example of algorithmic, and much faster than the usual 
tools of HOL such as REWRITE_CONV. However, we claim the contribution of 
computeLib is not only regarding efficiency but strengthens the vision of HOL as 
a programming language and as a tool for software verification. For the moment, 
this is still modest, but seems promising. The idea of extraction could bring a 
major speed up. 

The second aspect of this work is its generality, based on an abstract notion 
of theorems and a small number of inference rules. This made it very easy to 
adapt it to Isabelle’s meta-logic. The resulting conversion is not yet as efficient as 
with HOL, since the kernel is not implemented with explicit substitutions. It is 
yet slower than Isabelle’s optimized simplifier, but could be improved since about 
half of the time is spent only instantiating rewrites, operation which has not been 
optimized in section 4. Another asset of this abstractness is the possibility to 
trade safety for efficiency, depending on the way it is implemented. 
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Abstract. This paper presents proof terms for simply typed, intuitio- 
nistic higher order logic, a popular logical framework. Unification-based 
algorithms for the compression and reconstruction of proof terms are de- 
scribed and have been implemented in the theorem prover Isabelle. Ex- 
perimental results confirm the effectiveness of the compression scheme. 


1 Introduction 


In theorem provers based on the LCF approach, theorems can only be construc- 
ted by a small set of primitive inference rules. Provided the implementation of 
these rules is correct, all theorems obtained in this way are sound. Hence it is 
often claimed that constructing an explicit proof term for each theorem is unne- 
cessary. This is only partially true, however. If the core inference engine of a 
theorem prover is relatively large, correctness is difficult to ensure. Being able to 
verify proof terms by a small and independent proof checker helps to minimize 
the risks. Moreover, a precise notion of proof terms facilitates the exchange of 
proofs between different theorem proving systems. Finally, proof terms are a 
prerequisite for proof transformation and analysis or the extraction of computa- 
tional content from proofs. Probably the most prominent application these days 
is proof-carrying code [5], a technique that can be used for safe execution of un- 
trusted code. For these reasons we have extended Isabelle [9] with proof terms. 
However, apart from the actual implementation, our work is largely independent, 
of Isabelle and most of this paper deals with the general topic of proof terms 
for simply typed, intuitionistic higher order logic (abbreviated to AHOL below), 
Isabelle’s meta logic. Because other logics (e.g. full HOL) can be encoded in this 
meta logic, this immediately yields proof terms for those logics as well. 

We start with a disclaimer: the idea of proof terms based on typed A-calculus 
has been around for some time now and is the basis of a number of proof assi- 
stants for type theory, for example Coq [2]. Even more, with the advent of “pure 
type systems” and the A-cube [1], it became clear what proof terms for AHOL 
look like in principle (although this seems to have had little impact on the HOL 
world). What we have done is to re-introduce the strict syntactic separation 
between terms, types, and proofs to make it more amenable to readers from a 
simply typed background. Thus our presentation of proof terms can be seen as a 
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partial evaluation of the corresponding pure type system, i.e. separating the lay- 
ers. Things are in fact a bit more complicated due to the presence of schematic 
polymorphism in our term language. 

The main original contribution of the paper is a detailed presentation of 
proof compression. A naive implementation of proof terms results in proofs of 
enormous size because with every occurrence of a proof rule the instantiations 
for its free variables are recorded. Thus it is natural to try and leave out some of 
those terms and to reconstruct them by unification during proof checking. Necula 
and Lee [6] have presented a scheme for proof compression in LF, another logical 
framework based on type theory. They analyze the proof rules of the object logic 
statically to determine what can be reconstructed by a weak form of unification. 
In contrast, we do a dynamic analysis of each proof term to determine what 
can be dropped. Reconstruction of missing information by unification is also 
available in other systems, e.g. Elf [12,11], but none of them offers an automatic 
dynamic compression algorithm. 

There has also been work on recording proofs in the HOL system [3,14], but 
it is firmly based on a notion of proof that directly reflects the implementation 
of inferences as calls to ML functions. These proof objects lack the conciseness 
of A-terms and it is less clear how to compress them other than textually. 

We start by presenting the logical framework (§2) and its A-calculus based 
proof terms (§3). In order to shrink the size of proofs we introduce partial proofs 
(§4), show how to collect equality constraints from a (partial) proof (§4.1), how 
to solve these constraints (§4.2) (to check that the proof is correct), and how to 
generate partial proofs from total ones, i.e. how to compress proofs (§4.3). 


2 The Logical Framework 


In a nutshell, Isabelle’s meta logic [9,8] is the minimal higher order logic of 
implication and universal quantification over simply typed A-terms including 
schematic polymorphism. Thus types are first order only, which makes type 
reconstruction decidable. A type 7 is either a variable a or a compound type 
expression (71,...,7)tc, where tc is a type constructor and n is its arity. The 
(infix) constructor — for function types has arity 2. We assume implicitly that 
all types are well-formed, i.e. every type constructor is applied to the correct 
number of arguments. The set ¢ of terms is defined in the usual way by 


t=ale|Ariur.t|tt 
Formulae are terms of the primitive type prop. The logical connectives are: 


universal quantification A: (@— prop) > prop 
implication => :: prop > prop > prop 
Now we show how an object logic is formalized in this meta logic. As an example 


we have chosen a fragment of HOL. First we introduce new types and constants 
for representing the connectives of this logic: 


Tr :: bool > prop Vo: (a@— bool) > bool 
—+ :: bool — bool — bool 3: (a bool) > bool 
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Here, bool is the type of object level propositions. The function Tr establishes 

a connection between meta level and object level truth values: the expression 

Tr P should be read as “P is true”. The application of Tr is occasionally dropped 

when writing down formulae. The inference rules for the meta logic consist of 

the usual introduction and elimination rules and are shown in §3.1 below. 
Inference rules of object logics are usually written like this: 


P1 vee on 
vy 


In our meta logic they become nested implications: 
Pe 
Here are some examples: 


imp] : A AB. (Tr A= > Tr B) = Tr (A — B) 
impE : APQR. Tr(P—Q) = TPS (TQ=TR)—= TR 


all : AP. (Ax. Tr (P x)) => (Tr (Vz. P z)) 
allE : APaR. Tr (ve. Px)=>(Tr(P2)=>TrR)=>TrR 
el : AP. Tr (P xz) => Tr (Az. P x) 


exE : APQ. Tr (ar. Px) => (Az. Tr(P2)=TrQ)=TrQ 


Note that the introduction rules impl and alll are for object level implication 
and universal quantification are expressed by simply referring to the meta level 


counterpart of these connectives. The expression /\ z. ¢ is just an abbreviation 
for A (Az. ¢), and similarly for V and 4. 


3 Proof Terms 


3.1 Basic Concepts 


The set of proof terms p is defined as follows: 
p=hlepaa|Ahi¢.plAcut.plpp|pt 


The letters h, c, x, t, 6 and 7 denote proof variables, proof constants, term 
variables, terms of arbitrary type, terms of type prop and types, respectively. 
Note that terms, types and proof terms are considered as separate concepts. This 
is in contrast to type theoretic frameworks, where these concepts are identified. 
We will write [+ p: ¢@ for “p is a proof of ¢ in context I”, where ¢ is a term 
of type prop, representing the logical proposition proved by p. The context I 
associates a proof variable with a term representing the proposition whose proof 
it denotes, and a term variable with its type. We require each context to be well- 
formed, i.e. every variable is associated with at most one term or type. Proof 
constants correspond to axioms or already proved theorems. The environment 37 
maps proof constants to terms representing propositions. Our language of proof 
terms allows abstraction over proof and term variables, as well as application of 
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proofs to proofs and terms. The abstractions correspond to the introduction of A 
and ==>, while applications correspond to the elimination of these connectives. 
In contrast to polymorphic A-calculi, no explicit application and abstraction is 
provided for types. To achieve a certain degree of polymorphism, we allow 1(c) 
to contain free type variables and introduce the notation Cea to specify a 


suitable instantiation for them. The notion of provability can now be defined 
inductively as follows: 


X(c)=¢ 
Thiol’ Fh: Ph ema: b[tn/an] 
L,h:od+ p:~ IF @:: prop Tau thkp:¢ 
PE (Ah: 6. p):¢= Pers t.p): Aru. ¢ 
Thp:¢=>y 'q:¢ Thp:A\agr:t.¢ Ye ae aes 
Pr (pq): DE (pt): dft/a] 


The judgement I | t :: r used above expresses that the term ¢ has type 7 in 
context I’. We will not give a formal definition of this judgement here, since it 
is well-known from simply typed lambda calculus. 


3.2 Representing Backward Resolution Proofs 


This section explains how proof terms are constructed for proofs that are built 
up backwards by higher-order resolution as described by Paulson [8] and imple- 
mented in Isabelle. Although Isabelle also has LCF-like functions for forward 
proofs corresponding to the above inference rules, most proofs are constructed 
backwards without recourse to the forward rules. We now show how to augment 
these backward steps by proof terms. Thus the functions for backward resolution 
proofs need no longer be part of the trusted kernel of Isabelle. 
In Isabelle, proof states are represented by theorems of the form 


i So 


where ¢ is the proposition to be proved and ¥, ..., Wn are the | remaining sub- 
goals. Each subgoal is of the form |Z. A ==> P, where % and A is a context of 
parameters and local assumptions. 


Resolution 

A proof of a proposition ¢ starts with the trivial theorem ¢ => ¢ whose proof 
term is Av : ¢. v. The initial proof state is then refined successively using the 
resolution rule 


Deena Po 
G R f / / , 
es 0(2 hcg AB Pet aa: Pog. ABE a? Ga Pa) 
i ia 
Pi ue Pint py C 


where 0C=0P; 
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until a proof state with no more premises is reached. When refining a proof state 
having the proof term R’ using a rule having the proof term R, the proof term 
for the resulting proof state can be expressed by 


where @ is a unifier of C and P!. The first i — 1 abstractions are used to skip 
the first i - 1 premises of R’. The next m abstractions correspond to the new 
subgoals introduced by R. 


Proof by assumption _ 
If the formula P; in a subgoal /\ Z%. P, => P; of a proof state having the proof 


term R equals one of the assumptions in P,,, this subgoal trivially holds and can 
therefore be removed from the proof state 


Q1 vee Qind ree ee oh Qi+1 --- Om Q1 w+ Qi-1 Qi41 ver Om 
Cc > Cc 


where 1<j<n 


The proof term of the new proof state is obtained by supplying a suitable pro- 
jection function as an argument to R: 


AG—1- R Gay (AT Pn: D5) 


Lifting rules into a context 

Before a subgoal of a proof state can be refined by resolution with a certain 
rule, the context of both the premises and the conclusion of this rule has to 
be augmented with additional parameters and assumptions in order to be com- 
patible with the context of the subgoal. This process is called lifting. Isabelle 
distinguishes between two kinds of lifting: lifting over assumptions and lifting 
over parameters. The former simply adds a list of assumptions Q, to both the 
premises and the conclusion of a rule: 


Pe Pe Qn=— Pr... Qn=> Pin 
Rye. a ee ee 
The proof term for the lifted rule is 
ATm In- R (Tm Tn) 
where the first m abstractions correspond to the new premises (with additional 


assumptions) and the next n abstractions correspond to the additional assump- 


tions. 

Lifting over parameters replaces all free variables a; in a rule R[@j] by new 
variables a‘, of function type, which are applied to a list of new parameters T;,. 
The new parameters are bound by universal quantifiers. 


Pilael Pm lal py Ata. Pr la, tm]. Ata. Pm [ay Fe] 
C [ax] Se» Aan. C [ai, Fa 


The proof term for the lifted rule looks similar to the one in the previous case: 


Mra ta. Ra Za] Cm Za) 


Proof Terms for Simply Typed Higher Order Logic 43 


3.3. Constructing an Example Proof 


We will now demonstrate how a proof term can be synthesized incrementally 
while proving a theorem in backward style. A proof term corresponding to a 
proof state will have the general form 


A(gi : $1)---(Gn ibn). --. (ge BER)... 


where the bound variables g;, ..., gn, stand for proofs of the current subgoals 
which are still to be found. The x? and h* appearing in the proof term (g; x* h*) 
are parameters and assumptions which may be used in the proof of subgoal 1. 
As an example, the construction of a proof term for the theorem 


(aa. Vy. P x y) — (Vy. da. P x y) 


will be shown by giving a proof term for each proof state. The parts of the proof 
terms, which are affected by the application of a rule will be underlined. Initially, 
the proof state is the trivial theorem: 


step 0, remaining subgoal: (3z. Vy. P x y) —> (Vy. Jr. P x y) 
Ag: ((ar. Vy. P x y) — (Vy. dr. Px y)). g 


We first apply rule impl. Applying a suitable instance of this rule to the trivial 
initial proof term yields 


Ag: (aa. Vy. P x y) => (Vy. de. P xy). 


(Ag’ : (Ga. Vy. P x y) — (Vy. dz. P x y)). 9’) } proof term from step 0 
(impl (Az. Vy. P x y) (Vy. de. Px y) g) 
ee ee ee ee 


instance of impl 
and by #7 reduction of this proof term we obtain 
step 1, remaining subgoal: (dz. Vy. P x y) => (Vy. Jz. P x y) 
impl (dr. Vy. P x y) (Vy. dr. P x y) 


We now apply alll to the above proof state. Before resolving alll with the proof 
state, its context has to be augmented with the assumption dz. Vy. P x y of the 
current goal. The resulting proof term is 


Ag: (Ay. ae. Vy. Px y= > dr. Pry). 


impl (Sz. Vy. P x y) (Vy. da. P x y) } proof term from step 1 
((Ahg : (dx. Vy. Pw y = > Ay. ae. P xy). 
Ah : (Ax. Vy. P xy). lifted instance of alll 
alll (Ay. Sar. P x y) (he hi)) 


(Ah3 : (Ar. Vy. P x y) 


rearranging quantifiers 
Ay: 8. gy hs)) } es 


as before, we apply @ reduction, which yields 
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step 2, remaining subgoal: Ay. ar. Vy. Pr y=> dr. Pry 


Ag: (Ay. ae. Vy. Pay => de. Pry). 
impl (ar. Vy. P x y) (Vy. da. P x y) 
(Ah, : (Az. Vy. P x y). 
alll (Ay. Sx. P x y) (Ay:: B. 9g y h1)) 


By eliminating the existence quantifier using exE we get 


step 3, remaining subgoal: Ay x. Vy. Px y= > Jr.Pay 


Ag: (Ays. Vy. Px y=> de. Pry). 
impl (ar. Vy. P x y) (Vy. dr. P & y) 
(Ah, : (da. Vy. P x y). 
alll (Ay. Sx. P x y) 
(Ay :: 8. exE (Ag. Vy. P x y) (dx. P x y) hi (g y))) 


Applying the introduction rule exl for the existential quantifier results in 
step 4, remaining subgoal: Ay z. Vy. Pxry=>P (?r yz) y 


Ag: (Ayz. Vy. Pxy=>P (Pry a) y). 
impl (ar. Vy. P x y) (Vy. da. P x y) 
(Ahi : (dz. Vy. P x y). 
alll (Ay. de. P x y) 
(Ay :: 6. exE (Ar. Vy. P x y) (Sr. Px y) hi 
(Ag i: a. 
Aha: (Vy. P x y). 
exl (Az. Pz y) (2 yz) (gy z ha)))) 
We now eliminate the universal quantifier using allE, which yields 
step 5, remaining subgoal: \yz.P 2 (?yyz)=>P(?ryx)y 


Ag: (Aya. Px (yy x) => P (2x yz) y). 
imp! (Ar. Vy. P x y) (Vy. de. P x y) 
(Ahi : (ar. Vy. P x y). 
alll (Ay. dr. P x y) 
(ay :: B. exE (Ax. Vy. P x y) (Ar. Px y) hi 
(Ag sa. 
Aho: (Vy. Px y). 
exl (Ax. P x y) (?x y x) 
(allE (P x) (?y y a) (P (22 y x) y) he (gy 2))))) 

We can now prove the remaining subgoal by assumption, which is done by sub- 
stituting the projection function A(y :: 8) (x :: a). Ahg: (P x y). hg for g: 
step 6, no subgoals 


impl (ax. Vy. P x y) (Vy. dx. P x y) 
(Ahi : (Ax. Vy. P x y). 
alll (Ay. dz. P x y) 
(Ay :: B. exE (Az. Vy. P x y) (Sr. Pax y) hi 

ADO. 

Ahg: (Vy. P xy). 

exl (Av. Px y) x 
(allE (P 2) y (P 2 y) ha (Aha : (P @ y). ha))))) 
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4 Partial Proof Terms 


Proof terms are large, contain much redundant information, and need to be com- 
pressed. The solution is simple: leave out everything that can be reconstructed. 
But since we do not want to complicate reconstruction too much, it should not 
degenerate into proof search. Thus we have to keep the skeleton of the proof. 
What can often be left out are the ¢, 7 and t in Ah:d. p, Awut. p and (pt). 


Since we will have to reconstruct the missing information later on, it is con- 
ceptually simpler to model the missing information by unification variables. 
These are simply a new class of free (term and type) variables, syntactically 
distinguished by a leading “?”, as in ?f and 7a. We will sometimes write ?f, to 
emphasize that ?f has type 7. Substitutions are functions that act on unification 
variables, e.g. 0 = {?f + Az.z, ta TH. 

In the remainder of this section we work with partial proofs, where terms 
and types may contain unification variables, as in (p ?f). Note that term uni- 
fication variables that occur within the scope of a A need to be “lifted” as in 
Agit. (p(?fx)). Because of this lifting, this partial information may take up 
more space than it saves. Therefore an actual implementation is bound to intro- 
duce separate new constructors for proof trees, e.g. Ah:_. p, Av:_. p and (p _), 
where _ represents the missing information. This is in fact what Necula and Lee 
describe [6]. However, it turns out that our partial proofs are easier to treat 
mathematically, not far from the “.”-version, and also allow to drop only part 
of a term (although we will not make use of this feature). 


Of course, we cannot check partial proofs with the rules of §3.1. In fact, we 
may not be able to check them at all, because too much information is missing. 
But we can collect equality constraints that need to hold in order for the proof to 
be correct. Such equality constraints are of the form T,; =? T2, where T, and T» 
are either both terms or both types. A substitution solves a constraint if the two 
terms become equal modulo (7-conversion, or if the two types become identical. 
Sets of such equality constraints are usually denoted by the letters C and D. To 
separate C into term and type constraints, let C,; denote the term and C, the 
type part. The subscripts t and 7 do not refer to variable names but are simply 
keywords. 


We will now show how to extract a set of constraints from a partial proof; 
how to solve those constraints is discussed later on. 


4.1 Collecting Constraints 


The relation [| pb (¢, C) is a partial function taking [ and a partial proof p 
and producing a formula ¢ (which may contain unification variables) and a set 
of constraints C’. The function will be defined such that, if 6 solves C, then @(p) 


proves 6(¢). The notation Vr denotes the list of all term variables declared in 
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I’, and Tp denotes the list of their types, ie. Vp = 21...2n and Tf = 7)...T 
for FP = 21: 71...%n i Tn. 


X(c)=¢ 
yh: dF AD(b, 0) Pree > (¢lta/an], 9) 
Th: dt pp (yy, C) It de (r, D) 
D+ (Ah:¢. p) > (6 => 4, CUDU {r =’ prop}) 
Tjeutrk pp (¢, C) 
PH (gut. p) > (Agius. ¢, fleur. rs’ Ari. s| (r=? s) € C}UC,) 
TkKpp(¢ C) Frqe(y, D) 
DE (p 9) > (2p sprop Vrs {9 =" (YY => fap prop Vr)} UCU D) 
Trpp(¢, C) Prtp (rs, D) 
Te(pt)p (°f=+7—sprop Vr t, {¢=’ Nee T. fsa +2sprop Vr t}UCUD) 


As usual, the unification variables ?f must be “new” in each case. 

The above rules follow those in §3.1 very closely. For example, the intuition 
behind the rule for the application (p q) is the following: if p proves proposition 
@, then ¢ must be some implication and the proposition 7% proved by q must 
be the premise of this implication. Moreover, the proposition proved by (p q) is 
the conclusion of the implication. The set of constraints for (p q) is the union of 
the constraints for p and q, plus one additional constraint expressing that ¢ is a 
suitable implication. One point to note is the judgement [+ tp (7, D) used in 
two premises of the constraint collection rules. It corresponds to |} t :: 7 just 
like [+ pp (¢, C) corresponds to + p:: ¢, ie. D is a set of type constraints 
whose solvability implies that t has type 7. The rules for F + t> (7, D) are not 
given because they are well-known: both from the literature about type inference 
for simply typed terms and because they closely resemble the rules above, just 
one level down. In a setting where types, terms and proofs are not syntactically 
distinguished, we would only have one judgement .+ .> (., .). 

We introduce the notation (¢,C) = collect(I[’,p) as a functional variant of 
It pp (¢, C). 

Example 1. Let p = Ax :: 2a1. Ahi: (Fra x). Ay :: Fag. Ahe : (2F co. xy). hi hoy 
Then 
collect([],p) = (Az: Ta. "fry, 2 => Ay 103. Pf, Y= 
70 rx +205 -¥705-rprop TY Ys 
{Aw :: Pay.Ay 3: Pag. °F ty a= 
Az: Tor Ay s: Tas. Of cg FY => 9 205-4 %053-spr0p FY) 


Agi: Toy. Ay s: 203). 9 1) -41%03->prop TY = 
Ax :: To.Ay :: Tas). Azz 203. 


? 
Taq =" Tay > 76), 781 =’ prop 


2 7 
‘g Tay Ta3 7 7a3 prop ry z, 


704 =" Tay 3 Ta3 > 782, 782 =" prop}) 


where %g, %’ and 73; are new variables generated during constraint collection. 
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Theorem 1. (Soundness and completeness) 


1. If pe (¢’, C) and @ solves C then 0(1)+ O(p) : 6(¢’). 
2. If + p:¢ andl t+ pp (¢’, C) thenCu{o =? ¢’} is solvable. 


This theorem shows the advantage of working with partial proofs as opposed to 
proofs containing “_”: we can produce the full proof by instantiation from the 
partial one. 


Thus there are two possible system architectures for checking partial proofs: 


e either constraint collection + pp (¢’, C) and constraint solving are part 
of the trusted kernel and 6(¢’) is accepted as the correct answer; 

e or, for the security conscious, neither collecting nor solving is part of the 
trusted kernel and their result is checked by checking 6(I°) - @(p) : 0(¢’). 


4.2 Solving Constraints 


Our constraints are a mixture of term and type constraints. Type constraints 
can be solved by first-order unification and thus do not need to be discussed 
here: they can be solved at any time in any order. Therefore we concentrate on 
term constraints in this subsection. 

Since higher-order unification is undecidable, we restrict to unification of 
so called (higher-order) patterns [4], i.e. A-terms where each occurrence of a 
unification variable is applied only to distinct bound variables. For example 
Ary.?F y x is a pattern, whereas Ax.?F x x and ?Fa are not patterns. The set 
of all patterns is denoted by Pat. For us, the key property of patterns is that 
their unification is decidable and that solvable pattern unification problems have 
most general unifiers [4,12,7,13}. Thus we may assume a function mgu taking two 
patterns as arguments and either failing or returning the most general unifier of 
its arguments. 

Since the constraints C generated by [+ pp (¢, C) may contain non- 
patterns, we have to delay solving those constraints until (hopefully) they are 
turned into patterns by the solution of other pattern constraints. Of course, in 
the worst case C' may not contain any patterns at all, in which case we have to 
give up. Thus we have to take care when constructing partial proofs to make 
sure the complete proof can be reconstructed by pattern unification. 


Example 2. The first constraint in {?f? =" %, Ary.?fx =? Ary.?f y} is not 
a pattern constraint, but if the second constraint is solved first, it yields the 
substitution ?f +> Awx.2u, which turns the first constraint into the trivial ?u =? 2z. 


We say that a set of equality constraints C can be solved by pattern unification 
if C —>* 9, where —> is defined by the rewrite rule 


CuU{s =? t} — @(C) 


where s and t are patterns and mgu(s,t) returns the unifier @. Note that the 
choice of which pair s =’ t to solve at which point is immaterial because —> is 
confluent. This is well-known for first-order terms, and holds for patterns as well 
because of the existence of most general unifiers and the following easy results: 


48 S. Berghofer and T. Nipkow 


Lemma 1. 1. [f s,t € Pat and mgu(s,t) returns 6 then Ran(@) C Pat. 
2. If Ran(@) C Pat and u € Pat then @(u) € Pat. 


4.3 Compressing Proof Terms 


The basic idea here is straightforward: given a proof term, remove all of the 
information that can be reconstructed by unification. If our meta logic were 
Prolog (i.e. proof terms contain no As), we could drop all terms t in an application 
(p t), because first-order unification can reconstruct them. In our setting we 
require that pattern unification should be able to reconstruct the missing terms. 

Compression is performed in three phases. First, all terms and types in the 
proof are replaced by suitably lifted unification variables. A substitution rever- 
sing the term abstractions is constructed as well. 


varify(Z,T,Av:¢. p) = let (p',0) = varify(Z,7, p) 
in (Av:(2fesprop Z). P's OU {2f + AZ.t}) 
varify(Z,7T,Ay::7. p) = let (p’,6) = varify(Ty, T ta, p) 
in (Ay::7a. p’, 8) 
varify(Z,7,(pq)) = let (p',@) = varify(Z,7,p); (q’, 0") = varify(Z. T, q) 
in ((p’ q’), U6’) 
varify(Z,T,(pt)) = let (p’,@) = varify(Z,7, p) 
in ((p' (°fes2 F)), OU {2f > Az.t}) 
varify(Z, T, Cima? = (CRaJan)" 


Thus varify(Z,7,p) = (p’,@) does not quite imply 6(p’) = p because 6 does 
not reverse the type abstractions: types are first-order and thus they can be 
reconstructed uniquely by unification. 

Then the constraints are extracted from the resulting partial proof: (¢’,C) = 
collect(I’,p'). Finally we compute (with the help of function solves) a minimal 
set of term variables V C Dom(@) such that 0(C) can be solved by pattern 
unification. Thus the overall algorithm for compressing a proof p is 


compress(p,o) = let (p’,@) = varify((], [],p) 
(¢',C) = collect(|], p’) 
V = solves(CU {¢ =’ $'}, 0) 
in 4v(p') 


where | is the restriction of 6 to V. Function solves(D, 6) returns V C Dom(@) 
such that 6|y(D) is solvable by pattern unification. The details are explained 
below. 

The main correctness theorem expresses that compression does not lose any 
information in the following sense: the constraints collected from the compressed 
version of a valid proof are solvable by pattern unification and any solution yields 
a proof of the original formula. 


Theorem 2. Let ¢ be ground. If p : 6, q = compress(p,¢) and (~,C) = 
collect((|,q) then 


Proof Terms for Simply Typed Higher Order Logic 49 


1. CU{¢ =? Y} is solvable by pattern unification and 
2. if @ solves CU {bd =’ } thent A(q) : . 


The second part of the theorem follows directly from the soundness of collect 
(part 1 of Theorem 1) because ¢ is ground, i.e. 6(¢) = ¢. 

The fact that C U {¢ =’ y} is solvable by pattern unification is a bit more 
subtle. From q = compress(p,¢) it follows by definition of compress that there 
are p’, 6, ¢’, C’ and V such that (p’,9) = varify({], [],p), (¢’, C’) = collect([], p’), 
V = solves(C’ U{¢ =’ ¢'}, 0) and q = |v (p’). Because (w,C) = collect([],q) = 
collect({}, A|v (p’)) and (¢’,C") = collect(|], p’) it can be shown that ~ = 6|y(¢’) 
and C = 6|y(C’). Thus CU {¢ =’ w} = Alv(C’ U {¢ =’ ’}). It can be shown 
that 6(C’U{¢ =’ ¢’}) is solvable by pattern unification. Appealing to Theorem 3 
below, it follows that so is CU {¢ =? }. 


Theorem 3. /f 6(C) is solvable by pattern unification, then solves(C’, 0) termi- 
nates successfully with a set of variables V such that 6|y(C) is solvable by pattern 
unification. 


This can be viewed as a specification of solves or as its main correctness theorem. 
Of course there are trivial implementations of solves that simply return Dom(6). 
What we want is a minimal set V with the stated property. Note that in general 
there is no least such V: 


Example 3. Let C = {Ax.?f(%(x)) = Ax.f(x)} and 6 = {2f & f, 79 6 Az.z}. 
Then @(C), 4|¢2¢,(C) and @|,2}(C) are solvable by pattern unification. 


Therefore solves nondeterministically computes a minimal set of variables by 
simulating the process of solving the constraints by pattern unification. Every 
time the process gets stuck, i.e. no more pattern constraints are left, a minimal 
set of additional variables is instantiated. 

The main complication is that pattern unification may introduce new varia- 
bles. Thus we need to keep track of where they came from, i.e. which original 
variable needs to be instantiated in order to ensure that the new variable beco- 
mes instantiated. Again, there may be a choice: 


Example 4. Let C = {Axyz.?fzx =" Axyz.2%gry}UC"' and let 6 = {?f 
Ary.f y,% > Ary.f x} U6’. Solving the first constraint in C’ yields the substi- 
tution o = {?f  Ay.thy, % + Ary.thar}. Now we need to compute the value 
of 7A (in case it is required later on in order to continue) and we need to record 
that 2h depends on either ?f or %, i.e. instantiating ?f or ?g will instantiate ?h. 
We can chose to record either dependency. 


In the algorithm below, this dependency relation is stored in a partial function 
D mapping new variables to old ones they depend on. 

Finally we introduce some terminology to select those variables that occur 
in non-pattern positions: given an equality constraint st, MPVars(st) is the set 
of variables ?f that occur in subterms of the form ?f @ in st, such that @ is not 
a list of distinct bound variables. 
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Now we can describe solves(C,@) as an imperative algorithm. As long as 
there are pattern problems left in C, we solve them and propagate the solution. 
Once we are left only with non-pattern problems, we pick one such that values 
for all its non-pattern variables are known (via 6). All those variables are then 
instantiated and recorded in V. If there is no such non-pattern problem either, 
the algorithm fails. Luckily, Theorem 3 tells us that this will not happen in the 
cases we are interested in. 


V := 0; D:= {2f 6 2f | 2f € Var(C)} 
while C #@ do 
if there is a pattern problem (s =’ t) € C 
then o := mgu(s,t); C := o(C — {s =" t}); 
forall (?f Ht) Ea do 
5 := mgu(O(2f), 4(¢)); 
D:= DU{’R+H D(?f) | € Dom(6)} 
od 
else pick some st € C such that NPVars(st) C Dom(@); 
V := VU {D(?f) | 2f © NPVars(st)}; 
C= AlwPvars(st)(C); 
od; 


return V 


Note that the else-case does not necessarily compute a minimal V: if st = 
(Azy.2f x (2gyy) =’ Axy.x} and 6(?f) = Axy.x and 6(%) = Ary. then it suffices 
to instantiate either ?f or 79, whereas above both are added to V. The above al- 
gorithm can be refined by instantiating st stepwise from the top and normalizing 
the result each time. 


5 Implementation 


The algorithms for compression and reconstruction of proofs have been imple- 
mented in ML as a part of the theorem prover Isabelle. During the proof of a 
theorem, the corresponding proof term is built up incrementally. Since this may 
slow down the execution of proof scripts, the generation of proof terms can be 
switched on and off as needed: during the interactive development of a proof this 
feature may be switched off. When the proof is completed, the proof script may 
be re-run with proof generation switched on and the resulting proof term could 
be exported. 

We have tested the implementation on several proofs of theorems in Pelletier’s 
collection [10], which were generated by Isabelle’s tableau prover. The following 
table summarizes some results!. It shows the number of terms (i.e. terms such as 
@ and t in Ah:¢. p and (p t)) occurring in the uncompressed proof term, as well 
as the number of terms occurring in the compressed proof term (i.e. terms not 


“9 


replaced by placeholders “_”, as explained in §4). In all cases, the compression 


! These measurements were done on a Pentium II with 400 MHz and 512 MB RAM 
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ratio was more than 90%. The compression rate reported by Necula and Lee [6] 
appears to be a little better, but that is probably because we counted only the 
number of terms, not their size: our dynamic compression scheme should drop 
at least as much as Necula and Lee’s static scheme. 


number of terms reconstruction 
uncompressed | compressed | compression time [s] 


ie [| 8% [080 | 
0 9 | 0.190] 
02.7% 5.200 


2345 153 


The following diagram shows the correspondence between the number of terms 
in the uncompressed proof term and the time needed for reconstruction. 


6 


reconstruction time [s] 


0 4 ono 2 4 
0 500 1000 1500 2000 2500 
number of terms 


A crucial point is the efficient handling of large sets of constraints: when having 
computed a unifier for a constraint, one possibility is to apply this unifier to 
all the remaining constraints. Another possibility is to accumulate the unifiers 
and only apply them when needed. The first solution can be rather slow when 
having a large number of constraints, while the second solution—which has been 
chosen in our implementation—requires efficient data structures for storing sub- 
stitutions. To speed up reconstruction, our implementation of function collect 
described in §4.1 tries to solve newly introduced constraints immediately, instead 
of collecting all constraints first and then solving them at the end. 


6 Conclusion 


We have given a first presentation of proof terms for simply typed intuitionistic 
higher-order logic, an important logical framework. We hope that by unfolding 
the underlying type theory and explicitly isolating the simply typed components 
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familiar to users of Isabelle or HOL, this may popularize the A-calculus view of 
proofs in those quarters as well. We have also presented what appears to be 
a novel compression scheme for proofs. Hence our work provides a new and 
promising basis for exchanging proofs in simply typed logics, in particular HOL, 
both among theorem provers (especially automatic and interactive) and in the 
realm of proof carrying code. 
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Abstract. We provide a proof using HOL and SPIN of convergence for 
the Routing Information Protocol (RIP), an internet protocol based on 
distance vector routing. We also calculate a sharp realtime bound for this 
convergence. This extends existing results to deal with the RIP standard 
itself, which has complexities not accounted for in theorems about ab- 
stract versions of the protocol. Our work also provides a case study in 
the combined use of a higher-order theorem prover and a model checker. 
The former is used to express abstraction properties and inductions, and 
structure the high-level proof, while the latter deals efficiently with case 
analysis of finitary properties. 


1 Introduction 


The high connectivity on which the Internet relies is enabled by scalable and 
robust protocols that enable routers connecting different physical networks to 
forward packets toward destinations described in a uniform addressing system. 
The first Internet routing protocols were based on distance vector routing, which 
uses information about distance and direction to a destination to route packets. 
The first such protocol standardized by the Internet Engineering Task Force 
(IETF) was the Routing Information Protocol (RIP), and this protocol remains 
in widespread use today. Although the correctness of distance vector routing has 
been proved for theoretical versions of the algorithm, the RIP standard itself 
has never been proved to have some of the properties it is expected to possess. 
Since there exist non-trivial differences between the abstract version and the 
standard itself, proofs of some key properties of the standard are worthwhile. 
In this paper we carry out the proof of convergence using a combination of 
the HOL [5,9] higher-order theorem prover and the SPIN model checker [10,17]. 
The automated assistance reduces the burden of case analysis in parts of the 
standard where manual analysis would prove tedious. Moreover, the HOL/SPIN 
proof provides high confidence for RIP and insights into the techniques needed 
to address other routing protocols, most of which are more complex than RIP. 
Routing protocols are meant to be robust with respect to failures of links and 
routers. If there is a failure then the routers communicate this information and 
routing tables are updated to route around the failed link or router. This process 
takes some time since routers cannot possess instantaneous global knowledge 
of network characteristics. They therefore pass information that is incomplete 
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and, if the protocol has the right characteristics, they eventually converge on a 
suitable set of alternative routes. We have two results: we show that the RIP 
protocol will converge after a failure, and we calculate a sharp realtime bound on 
the time this will take as a function of the radius of the network. Both results are 
based on assumptions about network reliability and timing assumptions specified 
in the RIP protocol. 


The first proof concerns the convergence of the asynchronous distributed 
Bellman-Ford protocol as specified in the IETF RIP standard [8,12]. The classic 
proof of a ‘pure’ form of the protocol is given in [1]. Our result covers additional 
features included in the standard to improve realtime response times (e.g. split 
horizons and poison reverse). These features add additional cases to be conside- 
red in the proof, but the automated support reduces the impact of this comple- 
xity. Adding these extensions makes the theory better match the standard, and 
hence also its implementations. Our proof also uses a different technique from 
the one in [1], providing some noteworthy properties about network stability. 


Our second proof provides a sharp realtime convergence bound on RIP in 
terms of the radius of the network around its nodes. In the worst case, the 
Bellman-Ford protocol has a convergence time as bad as the number of nodes 
in the network. However, if the maximum number of hops any source needs 
to traverse to reach a destination is k (the radius around the destination) and 
there are no link changes, then RIP will converge in & timeout intervals for this 
destination. From our first proof of convergence, it is easy to see that this occurs 
within 2-(k —1) intervals, but the proof of the sharp bound of k is complicated 
by the number of cases that need to be checked: we show how to use automated 
support to do this verification, based on the approach developed in the previous 
case study supplemented by a new invariant. Thus, if a network has a maximum 
radius of 5 around each of its destinations, then it will converge in at most 5 
intervals, even if the network has 100 nodes. Assuming the timing intervals in 
the RIP standard, such a network will converge within 15 minutes if there are 
no link changes. 


The basis of our verification is the RIP standard. Early implementations of 
distance vector routing were incompatible, so all of the routers running RIP in 
a domain needed to use the same implementation. Users and implementors were 
led to correct this problem by providing a specification that would define precise 
protocols and packet formats, leading to the first version of the standard [8]. In 
time this standard was revised to a second version {12]. At the level of abstraction 
we use here, our proof is applicable to both of these versions. 


There have been a variety of successful formal studies of communication pro- 
tocols. However, most of the studies to date have focused on endpoint protocols 
(that is, protocols between pairs of hosts) using models that involve two or three 
processes (representing the endpoints, or the endpoints and an adversary, for 
instance). Studies of routing protocols must have a different flavor since a proof 
that works for two or three routers is not interesting unless it can be generali- 
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zed. Routing protocols generally have the following attributes which influence 
the way formal verification techniques can be applied: 


1. An (essentially) unbounded number of replicated, simple processes execute 
concurrently. 

2. Dynamic connectivity is assumed and fault tolerance is required. 

Processes are reactive systems with a discrete interface of modest complexity. 

4, Real time is important and many actions are carried out with some timeout 
limit or in response to a timeout. 


oo 


Most routing protocols have other attributes such as latencies of information flow 
(limiting, for example, the feasibility of a global concept of time) and the need to 
protect network resources. These attributes sometimes make the protocols more 
complex. For instance, the asynchronous version of the Bellman-Ford protocol 
is much harder to prove correct than the synchronous version [1], and the RIP 
standard is still harder to prove correct because of the addition of complicating 
optimizations intended to reduce latencies. 

Following this introduction we give a description of the Routing Information 
Protocol as specified in its standard. We then describe our formalization of RIP 
in HOL and SPIN. In the fourth and fifth sections we show the convergence 
of RIP and derive a sharp realtime bound for the convergence. In the sixth 
section we provide some analysis of our methodology including a discussion of 
the benefits of automation and some crude measurements of the complexity of 
the proofs as viewed by the automated tools and the person carrying out the 
verification respectively. Our final section summarizes the conclusions. 


2 Routing Information Protocol 


The RIP protocol specification is given in [8,12] and a good exposition can be 
found in [11]. We start by describing the general networking environment and the 
task of a routing protocol. Then we give a brief description of the RIP protocol, 
including its pseudocode (Appendices A.1, A.2). Finally, we discuss differences 
between the standard and the underlying theory and the way they affect protocol 
requirements. 


2.1 Routing in Internetworks 


An internet is a family of networks connected by routers. Figure 1 illustrates 
an internetwork with four networks (shown as clouds) and four routers (shown 
as black squares). The goal of the routers is to forward packets between hosts 
{shown as circles) that are attached to the networks. The routers use routing 
tables which they develop through running a distributed routing protocol. Packets 
from hosts travel in hops across networks linked by routers. Each router chooses 
a link on which to forward the packet based on the packet’s destination address 
and other parameters. In order to be able to make good forwarding decisions, 
routers need to maintain partial topology information in the routing tables. 
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Fig. 1. An Internet 


The aim of a routing protocol is to establish a procedure for updating these 
tables. In most cases, routing information can be exchanged only locally (i.e. 
between neighboring routers). However, the overall goal of a routing protocol 
is to establish good global paths (between distant hosts on the internet). An 
interface is the link between a router and a network. In this example, router 
ri has interfaces 71, 12 and 13, which connect it to the networks n1, n2 and 
n3 respectively. Hosts hl and h2 belong to the network n1. Routers are said to 
be neighbors if they have interfaces to a common network. In our example, all 
routers are neighbors of rl, but r2 and r4 are not neighbors. 


2.2 Routing Information Protocol 


Each RIP router maintains a routing table. A routing table contains one entry 
per destination network, representing the current best route to the destination. 
An entry corresponding to destination d has the following fields: 


— hops: number of hops to d (i.e. total number of routers that a message sent 
along that route traverses before reaching the network d - this includes the 
router where this entry resides). This is sometimes called a metric for d. 

— nextR: next router along the route to d. 

— nextlface: the interface that will be used to forward packets addressed to d. 
It uniquely identifies the next network along the route. 


Routers periodically advertise their routing tables to their neighbors. Upon 
receiving an advertisement, the router checks whether any of the advertised 
routes can be used to improve current routes. Whenever this is the case, the 
router updates its current route to go through the advertising neighbor. Routes 
are compared exclusively by their length, measured in the number of hops. 

The value of hops must be an integer between 1 and 16, where 16 has the 
meaning of infinity (a destination with hops attribute set to 16 is considered 
unreachable). Hence, RIP will not be appropriate for internets that contain a 
router and a destination network that are more than 15 hops apart from each 
other. 
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Appendices A.1 and A.2 show pseudocode for RIP. A router advertises its 
routes by broadcasting RIP packets to all of its neighbors. A RIP packet contains 
a list of (destination, hops)-pairs. A receiving router compares its current metric 
for destination to (1 + hops), which is the metric of the alternative route, and 
updates the corresponding routing entry if the alternative route is shorter. There 
is one exception to this rule—if the receiving router has the advertising router 
as nextR for the route, it adopts the alternative route regardless of its metric. 

Normally, a RIP packet contains information that reflects the advertising 
router’s own routing table. This rule has an exception too—routers do not ad- 
vertise routes on the interfaces through which they had been learned. Precisely, 
if a route is learned over the interface 2, it should be advertised on that inter- 
face with hops set to 16 (infinity). This rule is called split horizon with poisoned 
reverse and its purpose is to prevent creation of small routing loops. 

Each routing table entry has a timer expire associated with it. Every time 
an entry is updated (or created), expire is re-set to 180 seconds. Routers try 
to advertise every 30 seconds, but due to network failures and congestion some 
advertisements may not get through. If a route has not been refreshed for 180 
seconds, the router will assume that there was a link failure, the destination 
will be marked as unreachable and a special garbageCollect timer will be set to 
120 seconds. If this timer expires before the entry gets updated, the route is 
expunged from the table. 


2.3. Standard vs. Theory 


The mathematical theory behind RIP is described in [1] as the Asynchronous 
Distributed Bellman-Ford Algorithm (ADBF). In the ADBF model, at every 
point in time, a router is either idle, sending an advertisement, or receiving an 
advertisement. The routing table is updated upon receiving an advertisement. 
Details of the proof that ADBF finds shortest routes are presented in [1]. 

An interesting question is: ‘Can we use (essentially) the same proof to show 
that RIP protocol converges to the set of shortest routes?’ It turns out that 
the answer is quite certainly ‘no’. Although motivated by the ADBF, RIP stan- 
dard [8,12] differs from it in several important details: 


— ADBF has ‘more powerful bookkeeping’. In RIP, routers keep track of only 
one (current best) route to each destination. On the other hand, ADBF 
nodes keep, for each destination, the most recent routes through each of 
the neighbors. Correspondingly, this would be reflected in the pseudocode 
(Appendices A.1, A.2) by all subscript indices becoming (dest,neighbor), 
instead of just dest. This makes ADBF more flexible, which comes at the 
expense of maintaining a larger data structure. 

— RIP has ‘blind’ updates. As a consequence of the previous difference, RIP 
routers need to separately handle the case when an advertiseinent is received 
from a neighbor which is already nextR for the route. In this case, the recei- 
ving router can do nothing better than blindly accept the advertised route, 
regardless of its length. ADBF does not have this special case. 
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— RIP’s route length is bounded. RIP can handle routes of at most 15 hops. 
Distances of 16 or more hops are all considered equivalent to infinity. This 
is a practical optimization intended to balance the tradeoff between quicker 
loop elimination and greater range for routing information propagation. 

- RIP has the split horizon with poisoned reverse rule. This is another enginee- 
ring optimization, not present in ADBF. 


The first of the above gaps alone would be enough to make proofs of convergence 
requirements for RIP substantially different from proofs for ADBF. Besides mat- 
ching the RIP setting closely, our proof technique also gives useful insights about 
the speed of propagation of updates, which can be used for establishing timing 
bounds for convergence. 


3 Formal Specification of RIP 


In the previous section, we gave a short description of the RIP standard along 
with its pseudo-code. In this section, we present a formal specification of the 
protocol that can be analyzed by HOL90 and SPIN. First, we make some sim- 
plifications of the protocol: 


1. We observe that RIP (Appendices A.1, A.2) operates independently for every 
destination, with no interaction between the state or events associated with 
different destinations. This means that we need to specify and prove the 
protocol only for a single destination and the result will hold for the general 
version as well. 

2. We only analyze the protocol in between topology changes. When the proto- 
col starts, it may have any sound state to begin with. However, once it has 
started, one must give the protocol a reasonable period of time to converge. 
So we assume that there are no topology changes in the lifetime of the ana- 
lysis. Under this assumption, the protocol indeed converges as we show in 
Section 4. Moreover, in Section 5, we precisely characterize the time period 
for which there must be no topology changes to guarantee convergence. 

3. We abstract away from actual timing constraints. If topology changes are ru- 
led out, routes cannot be expired (exrpiredes:) or deleted (garbageCollectdest). 
So the only timing constraint left is the time interval between periodic broad- 
casts of advertisements. We model this by (a) enabling a router to broadcast 
advertisements at any time (a safe abstraction), and (b) adding a fairness 
assumption to the broadcast sequence. 


We next specify RIP in HOL for analysis in the HOL90 theorem prover. Then 
we model the protocol in Promela, the specification language for the SPIN model- 
checker. The Promela modeling is straight-forward: it simply involves rewriting 
the pseudo-code in terms of Promela’s C-like syntax, and SPIN’s event semantics. 
The HOL specification is more involved, since we need to transform the pseudo- 
code into a functional specification. 
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3.1 RIP in HOL 


For a RIP router, the universe U is a bipartite, connected graph whose nodes 
are partitioned into networks and routers, connected through interfaces. Routers 
are always connected to at least. two networks. We specify networks and routers 
using distinct uninterpreted type variables: ‘network, ‘router. Now any specific 
universe can be described simply by a function conn : ‘router — ‘network -> bool, 
that describes the interfaces—which routers and networks are connected with 
each other. A function conn describes a valid universe U if (a) conn connects 
each router to at least 2 networks, and (b) conn describes a connected graph. 

When the RIP protocol starts operating in a universe U, it is given as input 
a valid conn function, describing the topology of U/, and an initial state so. The 
protocol then seeks to compute paths from each router to the destination d. We 
describe the HOL specification in three steps: (1) the state of the protocol, along 
with an initial state assumption, (2) the processes that change the state and pass 
messages to each other, and (3) the semantics of these processes in HOL, and 
typical properties they are expected to satisfy. 


Protocol State The goal of RIP is to compute an optimal path at each router 
r to the destination network d. The path is described by a routing entry: the 
number of hops to d, the next router (nextR) along this path, and the network 
between r and nextR (nextN). RIP only computes paths of length less than 16; 
destinations more than 16 hops away are considered unreachable. 

The protocol state consists of a table of the current routing entries at each 
router r, which we call the routing table (rtable). A protocol state is defined 
as as a 3-tuple s : rtable whose components are hops : ‘router num and 
nextN : ‘router > ‘network and nextR : ‘router —> ‘router. In addition, we want all 
protocol states to be sound, where soundness is defined as follows: 


Definition 1 (Soundness). A protocol state s = (hops, nextN, nextR) of a uni- 
verse described by a valid conn is said to be sound with respect to d if 


1. Vr: ‘router.conn r (nextN(r)) A conn (nextR(r)) (nextN(r)) 

2. Yr: 'router.1 < (hops(r)) < 16 

3. Yr: 'router.(conn r d) = (hops(r) = 1) A (nextN(r) = d) A (nextR(r) = r) 
4. Vr: ‘router.>(conn r d) = (hops(r) > 1) A (nextN(r) 4 d) A (nextR(r) 4 r) 


We stipulate that the initial state of the protocol, so, must be sound. Observe 
that soundness really has to do with the ‘local’ connections at a router, which 
are typically configured by mechanisms external to RIP. By stipulating that the 
initial state is sound, we require that the router is never deluded about its local 
topology, otherwise there is no guarantee that it will ever discover global path 
information. Put another way, if the system ever gets into an unsound state, 
convergence cannot be guaranteed. Note however, that we can only assume the 
initial state to be sound, we need to prove that all succeeding states will remain 
sound under RIP. 
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Processes We represent different event handlers in the protocol by different 
processes; they typically perform different kinds of actions and may do so in 
parallel. As a result, there are three kinds of processes in the universe: each 
router r has an advertising process (generating advertisements), and a routing 
process (handling packet reception), and at each network net there is a network 
process (performing broadcasts). 

The advertising process persistently broadcasts route advertisements on each 
of its connected networks. Each such advertisement is a tuple (src, hopcount), 
saying that the broadcasting router src, knows of a path of length hopcount to 
the destination d. Suppose the protocol state is s = (hops, nextN, nextR), then 
the hopcount advertised by src on net may have the following values: 


— if net = nextN(src), then hopcount = 16 (Infinity); 
~ otherwise, hopcount = hops(src). 


When an advertisement is to be broadcast on network net, it is handed over 
to the network process for net. The network process executes the broadcast by 
attempting to deliver the incoming advertisement to all routers rcv connected 
to the net. We do not assume that the network is reliable in any way, so it may 
not deliver the advertisement to any router, or it may deliver it to some of the 
routers in an arbitrary order. However, we make the following assumptions 


— Fairness: the network cannot ignore a router forever. So in any execution of 
the network net, if a router src sends advertisements infinitely often, and rcv 
is another router connected to net, the network process must deliver src’s 
advertisements to rcv infinitely often. 

— Zero Delay: We assume that if the network does deliver an advertisement, it 
does so instantaneously. 


We call the tuple (src, net, rcv, hopcount), corresponding to the delivery of an 
advertisement, an advertisement event. Observe that the unreliability of the net- 
works in conjunction with the persistent broadcasts of the advertisement proces- 
ses allows many possible sequences of advertisement events. In fact, the network 
and advertising processes can generate every possible sequence of (src, net, rcv) 
tuples in advertisement events, subject to the fairness assumption and the fact 
that src and rcv must both be connected to net. The only advertisement field 
that depends on the network state is the hopcount. 

The third process in the system is the routing process at reach router. The 
routing process at router rcv reacts to incoming advertisements and updates 
the routing table entry, (hops(rcv), nextN(rcv), nextR(rcv)), at rcv. Essentially, 
if an advertisement, (src, hopcount), arriving at rcv through net, is such that 
hopcount + 1 < hops(rev) or src = nextR(rcv) A net = nextN(rcv), then the rou- 
ting table at rcv is updated so that hops(rcv) = hopcount + 1 and nextN(rev) = 
net and nextR(rcv) = src. In HOL, we represent this process by a state up- 
date function, update : rtable — (‘router * ‘network * ‘router * num) — rtable, 
which, given a protocol state, (hops, nextN, nextR), and an advertisement event 
(src, net, rcv, hopcount), computes the new protocol state. The HOL code for the 
update function is given for illustration in Appendix A.3. 
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Trace Semantics The observable behavior of the network and advertising pro- 
cesses is essentially an infinite sequence of advertisement events. Therefore, we 
choose to express the semantics of these processes as an event trace—an infi- 
nite sequence of tuples (src;, net;, rcv;), representing advertisement events. Such 
a trace is considered valid only if 


— the trace is fair—Vr,, rq: ‘router.Vi.3j > i.(src; = 71) A (revi = 72), and 
— the events are possible—Vi.(conn src; net;) A (conn rcv; net;.) 


The hopcount field of the advertisement can be filled in as follows: Suppose 
that at the i*” step (event) of the protocol, the state of the protocol is s; = 
(hops, nextN, nextR), then 


— if nextN(src;) = net;, then src; sends an advertisement (src;,16) to rev, in- 
stantaneously via net;,; 

— otherwise, src; sends an advertisement (src;,hops(src;)) to rcv; instanta- 
neously via net;. 


Given an event trace, the routing processes react to the events and update 
the protocol state. This produces an infinite state sequence of the protocol s; 
defined as follows: 


— So is any sound state 
— 8:41 = update s, (src;, net;, rcv;, hopcount,), where the hopcount field is filled 
in as described above. 


Thus the semantics of the update processes is the state sequence it can generate 
for a given event trace. All properties desired of the protocol are expressed and 
proved in terms of this state sequence. In particular, the convergence theorem 
states that, given any valid event trace, the states generated in the sequence 
must converge to the optimal routing table. 


3.2 RIP in Promela 


Promela [10,17] is a natural specification language for network protocols. In ad- 
dition to C-like programming constructs, it supports non-determinism, dynamic 
processes, and synchronous/asynchronous channel communication between pro- 
cesses. We translate the pseudo-code given in Appendices A.1, A.2 into Promela. 
A fragment of the resulting Promela code corresponding to the routing process 
is shown in Appendix A.4. 

As in the HOL specification, at each router, we have a routing process and an 
advertisement process. The advertisements process is a simple non-terminating 
while loop that keeps sending advertisement to all its neighboring networks. The 
routing process waits for input advertisements and processes them as before. 
Finally we have a network process for each network, which simply implements 
the broadcast mechanism by taking advertisements sent to the network and 
transporting them to the input buffers of all the routers connected to it. It is 
only the network processes that know the topology of the network, the routing 
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and advertisement processes only know the networks they are directly connected 
to. 

Once all the above Promela processes are in place, we use SPIN to simulate 
the protocol for sample topologies to check our model. We can also verify that 
the protocol works for small topologies. A point worth noting is that in varying 
the topologies, all we need to change is the encoding of the network processes. 
The routing and advertising processes operate above this connection layer. In 
effect, the network processes can pretend to have an arbitrary topology and 
the routing/advertisement processes will not know the difference. We use this 
property later in our SPIN proofs of convergence, where we fool a solitary update 
process to believe it is part of a larger network. 


4 Convergence of RIP 


In this section we present a proof of convergence for RIP. We prove that, in the 
absence of topology changes, RIP will find shortest routes to the destination d, 
from every router inside the range of 15 hops. 


4.1 Proof Results 


On the outermost level, our proof uses induction on distance from the destina- 
tion. For each router r, distance to d is defined as 


pO 1, if (conn r d) 
~ )1+4min{D(s) | s neighbor of r}, otherwise. 


For k > 1, the k-circle around d is the set of routers 
Cy, = {r | Dir) < k}. 
The key notion in our proof is that of the k-stability: 


Definition 2 (Stability). For k > 1, we say that the universe is k-stable if 
both of the following properties hold: 


(S1) Every router from the k-circle has its metric set to the actual distance to 
d. Moreover, if r is not connected to d, it has its nextR set to the first router 
on some shortest path to d: 


Vr.r € Cy => hops(r) = D(r) A(-connr d = D(nextR(r)) = D(r) — 1) 
(S2) Every router outside the k-circle has its hops strictly greater than k: 
Vr.r Cy = hops(r) > k. 


Our goal is to prove that under any fair advertisement trace, the universe is 
guaranteed to become k-stable, for every k < 16. This proof will be carried out 
by induction on k. 

Recall our stipulation that the universe starts in a sound state. It is easy to 
show that sound states are 1-stable, so this gives us the basis of induction: 
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Lemma 1. The universe U is initially 1-stable. 


A key property of k-stability is that once it is achieved, it is preserved fore- 
ver.This would not be true if our definition of stability did not contain condition 
52. This condition strengthens the induction hypothesis enough that we can 
induct on k-stability. 


Lemma 2 (Preservation of stability). For any k < 16, if the universe U 
is k-stable at some point, then it remains k-stable after an arbitrary number of 
advertisements. oO 


Lemma 1 is easily proved in HOL using the definition of soundness. We also 
prove Lemma 2 in HOL, and it involves a significantly larger case analysis. 

Progress from k-stability to (k + 1)-stability happens gradually—more and 
more routers start to conform to the conditions $1 and $2. This is why we need an 
additional, more refined, definition of stability which captures individual routers, 
rather than entire ‘circles’ of routers. 


Definition 3. Given a k-stable universe, we say that a router r at distance k+1 
from d is (k + 1)-stable if it has an optimal route: 


hops(r) = D(r) =k+1 A_ nextR(r) € Cy. 


To prove that a k-stable universe eventually becomes (k+1)-stable, it suffices 
to show that every router at distance (k + 1) eventually becomes (k + 1)-stable. 
This is the statement of the following lemma: 


Lemma 3. For any k < 15, and any router r such that Dir) = k +1, if the 
universe U is k-stable at some point and the advertisement trace is fair, then r 
will eventually become (k + 1)-stable. Moreover, r then remains (k + 1)-stable 
indefinitely. 0 


One of the key facts used in the proof of this lemma is fairness of the adverti- 
sement trace. Without fairness, neighbors of r would be allowed to simply stop 
advertising to r at any point. This would keep r’s routing table unchanged and 
hence prevent it from ever achieving (k + 1)-stability. 

Observe that Lemma 3 only involves one router r at a distance k + 1 from 
d. Starting from a k-stable state, we need to show that r converges to the cor- 
rect value. Moreover, since all future states of the system are guaranteed to 
be k-stable (Lemma 2), r will receive advertisements from only two kinds of 
neighbors—those within the k-circle, and those outside it. This leads us to a 
finitary abstraction of the system. We can then prove the lemma using SPIN, 
which performs an exhaustive state search to prove that r will converge to the 
right value. 

However, we need to prove that the finitary abstraction is property-preserving. 
This proof is done in HOL, by induction on the length of fair advertisement tra- 
ces. The abstraction proof is the crucial link that allows us to join the HOL and 
SPIN results, without loss of rigor. The abstraction proof itself is rather com- 
plex with a large case analysis. However, the effort is justified since the finitary 
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abstraction can then be used for multiple proofs with minor modifications. This 
re-use can be seen in the proof statistics in Section 6 (Table 1). The abstrac- 
tion proof is represented in the table as the HOL portion of the second proof of 
Stability Preservation (Lemma 2). 

Finally, using the fact that there are only finitely many routers, we easily 
derive the Progress Lemma which proves our inductive step: 


Lemma 4 (Progress). For any k < 15, if the universe U is k-stable at some 
point, then U will eventually become and remain (k + 1)-stable indefinitely. OU 


The main result about the convergence of RIP is a corollary of the above 
inductive argument: 


Theorem 1 (Convergence of RIP). Starting from an arbitrary sound in- 
itial state, evolving under an arbitrary fair advertisement trace, the universe U 
eventually becomes and remains 15-stable. { 


4.2 Significance of the Results 


Our proof, which we call the radius proof, differs from the one described in [1] for 
the asynchronous Bellman-Ford algorithm. Rather than inducting on estimates 
for upper and lower bounds for distances, we induct on the the radius of the 
stable region around d. The proof has three attributes of interest: 


1. It states a property about the RIP protocol, rather the asynchronous distri- 
buted Bellman-Ford algorithm. 

2. The radius proof is more informative. It shows that correctness is achie- 
ved quickly close to the destination, and more slowly further away. It also 
implicitly estimates the number of advertisements needed to progress from 
k-stability to (k + 1)-stability. We exploit this in the next section to show a 
realtime bound on convergence. 

3. It uses a combination of theorem proving and model checking. HOL is more 
expressive and serves as the main platform. SPIN is used to treat large case 
analyses. 


5 Sharp Timing Bounds for RIP Stability 


In the previous section we proved convergence for RIP conditioned on the fact 
that the topology stays unchanged for some period of time. We now calculate 
how big that period of time must be. To do this, we need to have some knowledge 
about the times at which certain protocol events happen. In the case of RIP, we 
use a single reliability assumption that describes the frequency of advertisements. 


Fundamental Timing Assumption: There is a value A, such that during 
every topology-stable time interval of the length A, each router gets at least 
one advertisement from each of its neighbors. O 
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RIP routers normally try to advertise every 30 seconds. However, because of 
congestion or some other condition, some packets may not go through. This is 
why the standard prescribes that a failure to receive an advertisement within 
180 seconds is treated as a link failure. Thus, A = 180 seconds satisfies the 
Fundamental Timing Assumption for RIP. Notice that the assumption implies 
fairness of the advertisement trace. 

As before, we concentrate on a particular destination d. Our timing analysis 
is based on the notion of weak k-stability. 


Definition 4 (Weak stability). For 2<k < 15, we say that the universe U 
is weakly k-stable if all of the following conditions hold: 


(WS1) U is (k — 1)-stable. 
(WS2) Vr. Dir) =k => (ris k—stable V hops(r) > k). 
(WS3) Vr. D(r) > k = hops(r) > k. 


Weak k-stability is stronger than (k — 1)-stability, but weaker than k-stability. 
The second disjunct in WS2 is what distinguishes it from the ordinary k-stability. 
Similarly as before, we have the preservation lemma: 


Lemma 5 (Preservation of weak stability). For any 2 < k < 15, if the 
universe U is weakly k-stable at some time t, then it is weakly k-stable at any 
time t' > t. O 


This lemma and all of the subsequent results in this section are stated using real 
time. This is possible because of the Fundamental Timing Assumption, which 
provides a connection between discrete advertisement events and continuous 
time. Precisely, to show that some property P holds after a A time interval, it is 
enough to prove that P holds after each router receives at least one advertisement 
from each of its neighbors. 

Now we show that the initial state inevitably becomes weakly 2-stable after 
RIP packets have been exchanged between every pair of neighbors: 


Lemma 6 (Initial progress). If the universe U is in a sound state and the 
topology does not change, U becomes weakly 2-stable after A time. QO 


The main progress property says that it takes one A-interval to get from a 
weakly k-stable state to a weakly (k + 1)-stable state. This property is shown 
in two steps. First we show that condition WS1 for weak (k + 1)-stability holds 
after A: 


Lemma 7. For any 2<k < 15, if the universe is weakly k-stable at some time 
t, then it is k-stable at time t+ A. 0 


Then we show the same for conditions WS2 and WS3. The following puts both 
steps together: 


Lemma 8 (Progress). For any 2 < k < 15, if the universe is weakly k-stable 
at some time t, then it is weakly (k + 1)-stable at time t+ A. U 
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Lemmas 5, 6, and 8 are proved in SPIN (Lemma 7 is contained in Lemma 8). 
The technique for doing the proofs in SPIN is the same as in the previous section. 
We find a finitary abstraction of the system starting from the time when the 
universe if weakly k-stable. This abstraction allows us to prove the Lemmas in 
SPIN for a single router. 

The radius of the universe (around d) is the maximum distance from d: 


R=mazr{D(r) |r is a router}. 


The main theorem describes convergence time for a destination in terms of its 
radius: 


Theorem 2 (RIP convergence time). A sound universe of radius R beco- 
mes 15-stable within max{15, R}- A time, assuming that there were no topology 
changes during that time interval. 0 


The theorem is an easy corollary of the preceding lemmas and is proved in HOL. 
Consider a universe of radius R < 15. To show that it converges in R- A time, 
observe what happens during each A-interval of time: 


weakly 3-stable (by Lemma 8) 
weakly 4-stable (by Lemma 8) 


R-stability means that all the routers that are not more than R hops away from d 
will have shortest routes to d. Since the radius of the universe is R, this includes 
all routers. 

An interesting observation is that progress from k-stability to (k+1)-stability 
is not guaranteed to happen in less than 2- A time (we leave this to the reader). 
Consequently, had we chosen to calculate convergence time using stability, rather 
than weak stability, we would get a worse upper bound of 2-(R—1)-A. In 
fact, our upper bound is sharp: in a linear topology, update messages can be 
interleaved in such a way that convergence time becomes as bad as R- A. 

Figure 2 shows an example that consists of k routers and has the radius k 
with respect to d. Router r; is connected to d and has the correct metric. Router 
rq also has the correct metric, but points in the wrong direction. Other routers 
have no route to d. In this state, ra will ignore a message from r;, because that 
route is no better than what rg (thinks it) already has. However, after receiving 
a message from rg, to which it points, rg will update its metric to 16 and lose the 
route. Suppose that, from this point on, messages are interleaved in such a way 
that during every update interval, all routers first send their update messages and 
then receive update messages from their neighbors. This will cause exactly one 
new router to discover the shortest route during every update interval. Router 
rq will have the route after the second interval, r3 after the third, ..., and rz 
after the k-th. This shows that our upper bound of k- A is sharp. 
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Fig. 2. Maximum Convergence Time 


6 Analysis of Methodology 


SPIN is extremely helpful for proving properties such as Lemma 8, which involve 
tedious case analysis. To illustrate this, assuming weak k-stability at time f, 
consider what it takes to show that condition WS2 for weak (k + 1)-stability 
holds after A time. (WS1 will hold because of Lemma 7, but further effort is 
required for WS3.) To prove WS2, let r be a router with D(r) =k +1. Because 
of weak k-stability at the time t, there are two possibilities for r: (1) r has a k- 
stable neighbor, or (2) all of the neighbors of r have hops > k. To show that r will 
eventually progress into either a (k + 1)-stable state or a state with hops > k+1, 
we need to further break the case (2) into three subcases with respect to the 
properties of the router that r points to: (2a) r points to s € Cy (the k-circle), 
which is the only neighbor of r from C,, or (2b) r points to s € C,, but r has 
another neighbor t € C, such that t # s, or (2c) r points to s ¢ Cy. Each of 
these cases, branches into several further subcases based on the relative ordering 
in which r, s and possibly t send and receive update messages. 

Doing such proofs by hand is difficult and prone to errors. Essentially, the 
proof is a deeply-nested case analysis in which final cases are straight-forward to 
prove—an ideal job for a computer to do! Our SPIN verification is divided into 
four parts accounting for different kinds of topologies. Each part has a distinguis- 
hed process representing r and another processes modeling the environment for 
r. An environment is an abstraction of the ‘rest of the universe’. It generates all 
message sequences that could possibly be observed by r. SPIN considered more 
cases than a manual proof would have required, 21,804 of them altogether for 
Lemma 8, but it checked these in only 1.7 seconds of CPU time. Even counting 
set-up time for this verification, this was a significant time-saver. The resulting 
proof is probably also more reliable than a manual one. 

Table 1 summarizes some of our experience with the complexity of the proofs 
in terms of our automated support tools. The complexity of an HOL verification 
for the human verifier is described with the following statistics measuring things 
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Table 1. Protocol Verification Effort on RIP Convergence 


(Task [HOL SPIN 
Modeling RIP 495 lines, 19 defs, 20 lemmas _ {141 lines 


Stability Preservation Once |9 lemmas, 119 cases, 903 steps Nos oe ae ow etl 
Stability Preservation Again|29 lemmas, 102 cases, 565 steps|207 lines, 439 states _| 
nitial Weak Stability 


— 


Weak Stability Progress 


written by a human: the number of lines of HOL code, the number of lemmas 
and definitions, and the number of proof steps. Proof steps were measured as 
the number of instances of the HOL construct THEN. The HOL automated con- 
tribution is measured by the number of cases discovered and managed by HOL. 
This is measured by the number of THENL’s, weighted by the number of elements 
in their argument lists. The complexity of SPIN verification for the human ve- 
rifier is measured by the number of lines of Promela code written. The SPIN 
automated contribution is measured by the number of states examined and the 
amount of memory used in the verification. In our investigations we have fo- 
und that SPIN is generally memory bound, that is, it runs out of memory in 
a relatively short period of time if the state space it must search is too large. 
For our final RIP proofs, however, each of the verifications took less than a mi- 
nute and the time is generally proportional to the memory. Most of the lemmas 
consumed the SPIN-minimum of 2.54MB of memory, some required more. The 
figures were collected for runs on a lightly-loaded Sun Ultra Enterprise with 
1016MB of memory and 4 CPU’s running SunOS 5.5.1. The tool versions used 
were HOL90.10 and SPIN-3.24. We carried out parallel proofs of Lemma 2, the 
Stability Preservation Lemma, using HOL only and HOL together with SPIN. 


It is important to observe that the SPIN figures were derived from final runs. 
The typical process was as follows: attempt to prove a result with SPIN, find 
that it is too costly, apply an abstraction that was proved in HOL, and try the 
SPIN proof again on the abstracted problem (which presumably has a smaller 
set of cases to check). This was repeated until we were happy with the size of 
the SPIN state space and the clarity of the abstractions. This use of SPIN was 
worthwhile even if the proof was eventually carried out entirely in HOL since 
SPIN provided a quick way to ‘debug’ our lemmas. We experimented with the 
question of whether to stop with a mixed HOL/SPIN proof or complete the entire 
proof in HOL. A proof entirely in HOL arguably provides more confidence since 
the relationship between the HOL and SPIN parts of a proof are treated manually 
in our study. We proved stability preservation twice, once using HOL/SPIN and 
again using only HOL. Table 1 indicates some associated statistics showing that. 
the complexity of the HOL proof dropped by about 40% at the cost of writing 
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207 lines of SPIN code. In future work we may attempt to measure programmer 
months since this would provide a more complete indication of scalability. 


7 Related Work 


Combining model checking with theorem proving has long been recognized as a 
very promising direction in effective formal methods [2]. There are primarily two 
ways in which the methodologies can be combined. Systems like PVS [16] use 
model checking as a decision procedure to solve finitary sub-goals in a deductive 
proof. On the other hand, model checking can be used to prove a finitary ab- 
straction of a system where the soundness of the abstraction can be proved in a 
theorem prover [13,14]. We use the latter methodology for our proofs—we carry 
out our induction and abstraction proofs in HOL90, while the induction step is 
proved for a finitary abstraction of the system in SPIN. 

In recent years, a variety of protocol standards have been formally verified. 
Notable success has been achieved in verifying cache coherence protocols, bus 
protocols and endpoint communication protocols [2]. In the domain of routing 
protocols, there has been work on verifying ATM routing protocols [3], where the 
authors use SPIN to verify the absence of deadlock of the routing protocol for a 
few fixed configurations. An instance verification of an Active Network routing 
protocol has been carried out in Maude [18]. Formal testing support has been 
developed for multicast routing protocols [7]. Other work has been in the form 
of manual proofs of key safety properties [6,15,4,1]. 


8 Conclusion 


This paper provides the most extensive automated mathematical analysis of an 
internet routing protocol to date. Our results show that it is possible to provide 
formal analysis of correctness for routing protocols from IETF standards and 
drafts with reasonable effort and speed, thus demonstrating that these technique« 
can effectively supplement other means of improving assurance such as manuai 
proof, simulation, and testing. Specific technical contributions include the ft 
proofs of the convergence of the RIP standard, and a sharp realtime bound tor 
this convergence. We have also gained insight into strategies for combining a 
higher-order theorem prover such as HOL with a model checker such as SPIN in 
a unified methodology that leverages the expressiveness of the theorem prover 
and the high level of automation of the model checker to provide an efficient but 
high-confidence analysis. 
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A Code Samples 


A.1 Pseudocode for RIP Declarations 


process RIPRouter 


state: 
me // ID of the router 
interfaces // Set of router’s interfaces 
known // Set of destinations with known routes 
hopSaest // Distance estimate 
nextR geet // Next router on the way to dest 
neat facedest // Interface over which the route advertisement was received 
timer expiregest // Expiration timer for the route 
timer garbageCollectiess // Garbage collection timer for the route 
timer advertise // Timer for periodic advertisements 
events: 


receive RIP (router, dest, hopCnt) over iface 
timeout (expiredest) 

timeout (garbage Collectdest) 

timeout (advertise) 


utility functions: 
broadcast(msg, iface) 
{ Broadcast message msg to all the routers attached to the network on the other side 
of interface iface. 


} 


A.2 Pseudocode for RIP Event Handlers 


event handlers: 
receive RIP (router, dest, hopCnt) over iface 


{ 


newMetric «+ min (1 + hopCnt, 16) 
if (dest ¢ known) then 


if (newMetric < 16) 


{ 
hopSdest + newMetric 
neatRyest <- router 
nextl facedest <— iface 
set expiredess to 180 seconds 
known < known U {dest} 
} 
} else 
{ if (router = nertRa.s) or (newMetric < hopsdest) 
{ 
hopSdest + newMetric 
nertRaest + router 
nextlfacegest <- iface 
set expiredes: to 180 seconds 
if (newMetric = 16) then 
set garbageCollectdges, to 120 seconds 
} else 
deactivate garbageCollectdes: 
y}}} 


timeout (expiredest ) 
{ hopsuest + 16 
set garbageCollectg.., to 120 seconds 


} 


timeout (garbageCollectest) 
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{ known + known — {dest} 


} 


timeout (advertise) 


for each dest € known do 
for each i € interfaces do 


if (i = nertIfacedest) then 


broadcast ([RIP(me, dest, hopsdest)], %) 
} else 


broadcast ([RIP(me, dest, 16)}, +) // Split horizon with poisoned reverse 


set advertise to 30 seconds 


A.3  HOL Code for Update Function 


val update_DEF = new_definition 
("update", 
--“'(srce:’router) (net:’network) (rcev:’router) (hopcount:num) 
(hops:’router->num) (nextN:’router->’network) (nextR:’router->’router). 
update (hops ,nextN,nextR) (src,net,rcv,hopcount) = 
let (nh,nn,nr) = 
(C(mextR(rev)=sre) /\ (nextN(rev)=net)) => 
(SUC hopcount ,net,src) 
| CCCSUC hopcount) < hops(rev)) => 
(SUC hopcount ,net,src) 
| (Chops(rev) ,nextN(rev) ,nextR(rev)))) 
in ((\r:’router.(r=rev)=> nh | (hops(r))), 
(\r:’router.(r=rev)=> nn | (nextN(r))), 
(\r:’router.(r=rev)=> nr | (nextR(r))))‘--); 


A.4 Promela fragment for Routing Process 


proctype Update(router ME){ 
mesg adv; 
chan in = routerinput [ME] ; 


do 
atomic{in?adv -> 
if 
i: (adv.sre == rtable[ME].nextR) && 
(adv.net == rtable[ME].nextN) -> 
if 
: adv.hopcount >= INFINITY -> 
rtable[ME].hops = INFINITY 
: adv.hopcount < INFINITY -> 
rtable[ME].hops = adv.hopcount + 1 
fi 
:: adv.hopcount + 1 < rtable[ME].hops -> 
rtable[ME).nextR = adv.src; 
rtable [ME] .nextN = adv.net; 
rtable[ME].hops = adv.hopcount + i 
:: else -> skip 
fi} 
od 
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Abstract. Families of inductive types defined by recursion arise in the 
formalization of mathematical theories. An example is the family of term 
algebras on the type of signatures. Type theory does not allow the direct 
definition of such families. We state the problem abstractly by defining 
a notion, strong positivity, that characterizes these families. Then we in- 
vestigate its solutions. First, we construct a model using wellorderings. 
Second, we use an extension of type theory, implemented in the proof 
tool Coq, to construct another model that does not have extensionality 
problems. Finally, we apply the two level approach: We internalize in- 
ductive definitions, so that we can manipulate them and reason about 
them inside type theory. 


1 Introduction 


In type theory we can define a new inductive type by giving its constructors (or 
introduction rules). For example, we define the types of natural numbers, binary 
trees, and lists over a type A as 


n:N aj: T 29: T 
 ceeee! jee Pe Geely poe 
y O0:N (Sn): N’ leaf: T node(x1, x72): T’ and 
: a: Al: List(A) 
t : —_—— SUEY aEnETT nor area ane 
ee) nila: N— consa(a,l): List(A)’ 
respectively. 


Consider the family T: N ~ * (* indicates the type of all small types, or 
sets) of inductive types indexed on the natural numbers: 


To: Co: To 
x: T; 
Ty: — ———— 
; cy: Ti co(x): T, 
(1) 
x: T> 41:7, xq: To 
To: 


c2:T2 cy(x): To €o(x1, 22): Te 


J. Harrison and M. Aagaard (Eds.): TPHOLs 2000, LNCS 1869, pp. 73-89, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 


74 V. Capretta 


Every new type in the family is defined by a new constant and by the constructors 
of the previous type in the hierarchy with an extra recursive argument. Intuiti- 
vely T,, is the type of trees with branching degree at most n. In the standard 
formulation of inductive types this definition is not allowed: The constructors 
and their types must be given directly at the moment of definition of the induc- 
tive type, whereas the number of constructors of T,, and their types are defined 
by recursion on n. 

Families of this kind have not only theoretical interest. They arise in the 
course of formalization of mathematics in a proof tool. I first encountered them 
when I was working on the formalization of Universal Algebra in Coq (see [6] 
and [7|). The family of term algebras on the type of signatures is one of them. 
The type of single-sorted signatures is Sig := List(N). Given a signature 0 := 


[a1,... ,@n], the type of terms over @ is defined by 
Term. tir: Term, --- tia, : Terms aA tni: Terms --- tna, : Terme 
¢ (fy tir -+-+tia,): Terme (fn tri +++tna,): Terms 


(One of the a;’s must be 0, so that Term, is nonempty.) We cannot obtain the 
family Term: Sig — * directly with an inductive definition, because the number 
and arity of the constructors depend on the signature o: They are not fixed 
for the whole family. The situation is even more complicated when we consider 
many-sorted signatures, which require families of mutual inductive types. In 
[6] we used Martin-Lof’s W types to solve this instance of the problem. Here we 
formulate the general problem, we show that W types still provide a good model, 
but also propose a better solution (which, however, requires an extension of type 
theory). You can see the details of its application to many-sorted algebras in [7]. 

In Section 3 we formulate the general problem: We propose an extension of 
the notion of strictly positive operator, which is used to determine the admis- 
sibility of inductive definitions, using positive type pointers—that is, terms that 
specify the positive occurrence of parameters in recursive definitions. 

In Section 4 we represent inductive types using Martin-L6f’s type construc- 
tor for wellorderings (W types) (see [14,15] and chapter 15 of [18]), extending 
the work by Dybjer [10]. This solution has the disadvantage that structurally 
equal elements of a W-type are not always convertible, thus making the W-type 
representation only extensionally isomorphic to the desired inductive type. 

Alternatively, we can exploit the extension of the positivity condition im- 
plemented in the system Coq and described by Gimenez in [12]. It allows an 
inductive definition to inherit a positive occurrence of a type variable from ano- 
ther inductive definition. To use this construction in our case, we need to give 
a translation of our recursive family of operators into an inductive family. In 
Section 5 we give such translation and we use it to solve our problem. 

Finally, in Section 6 we use the two level approach (see [4], [5], [13] and [3]}: 
Positivity is a metapredicate; that is, it is not expressed inside type theory but 
is an external syntactic property of type operators. This means that we cannot 
reason about positive operators and inductive definitions inside type theory. We 
internalize it by defining a type-theoretic predicate Positive expressing the me- 
taproperty. We define also a type of codes for inductive types and associate a 
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code to every proof of an instance of the predicate Positive. We define a func- 
tion that associates a type to every code. Now we solve our problem by first, 
constructing a family of positive operators by recursion; second, proving their 
positivity inside type theory; third, obtaining the corresponding family of codes; 
finally, instantiating the codes to types. This last method has been completely 
formalized in the proof assistant Coq [2]. 


2 Inductive Types 


We work in a type theory that is at least as expressive as the Pure Type System 
AP@ (see {1]): There are two sorts of types, * for small types and O for large 
types. Sort * is an element of 0. Moreover, we have sum and » types, which 
can be considered as special cases of inductive types, which we define later in 
this section. In AP@ every small type T: * has an isomorphic version in O. 
For simplicity we identify the two; in other words, we consider * and 0 as the 
first two steps in a cumulative hierarchy of type universes. When we write type 
expressions that mix the two sorts, as T x * or T + *, the version of T in O is 
used. Note, however, that if * is impredicative (for example, if we work in the 
Calculus of Constructions) not all elements of * can have a representation in 0, 
because this would lead to Girard’s paradox (see [8]). Only if impredicativity 
was not used in the definition of the type, we can consider it as an element of 
OQ. When we use small types in 0 constructions we assume that this condition 
is satisfied without saying it (as supported by the Coq implementation). 

We use the notation t[z] to denote a term t in which a variable x may occur. 
Thus ¢ and t[{2] denote the same term, but in the second expression we stress 
the dependence on x. Do not confuse this notation with (f x), which denotes 
the application of a function f to x. If s is a term of the same type as 2, t[s] 
denotes the result of the substitution t{x := s]. 

In extensional type theory inductive types can be implemented as fixed points 
of type operators (see {17]). We are working in intensional type theory, in which 
inductive types are recursively defined by constructors. Following [9], [19], [21] 
and [23] an inductive type J is defined by a list of constructors: 


inductive I [XPS]: (21: Qi) +++ (Zn: Qn)* = 
Cy} (1: Py) +++ (ry? Prey) Mir ++: Mim) 
(2) 
Cn: (1: Pai)-++(@K, 2: Par, )U Mni--+:Mam) 
end, 


where J does not occur in the Ss, Qs and Ms and occurs only strictly positively 
in the Ps. & is a list of general parameters of J (such as the parameter X in 
List(X)). See one of the cited references or chapter 4 of the Coq manual [2] for 
the definition of strict positivity and for the other rules. 

If the types of the constructors do not use dependent product—that is, they 
are in the form P;; > --- > Pi,, > J—we can use the alternative formulation 
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of inductive types as fixed points of strictly positive type operators (see, for ex- 
ample, [10]). It is less intuitive but simpler for theoretical purposes, so we adopt 
it. Every strictly positive operator X: * + ®[X]: * has a functorial extension, 
which, for X,Y: *, maps every f: X — Y to a function O[f]: dX] > HY]; 
preserving identities and composition (see [9] and [20]). This condition is suffi- 
cient to formulate the rules for inductive types (Matthes [16] gives an extension 
of system F in which this is the only condition required for inductive types). 
In the next sections we consider extensions of the positivity condition that still 
have the functorial property. The rules for inductive types are then the same 
as in the following definition, with the corresponding property replacing strictly 
positive. 


Definition 1. Let X: *+ [LX]: * be a strictly positive operator. The inductive 
type x() is defined by the following rules (where we write I for ux(®)): 


formation I: x 


introduction eT 
(u-intro y): I 
sina It (Pax): * 2: 8((L' I P)t u: (P (p-intro ([7y] z))) 


(u-ind [z]u): (x: D)(P x) 


conversion (y-ind [z]u (u-intro y)) ~ ul(P[[z] (x, (u-ind [z]u x))] y)] 


We use this formulation to define our inductive types, since they are all non- 
dependent, but we use the notation of Formula 2 when it is intuitively clearer and 
when we need to define types whose constructors belong to dependent product 
types. 

If the elimination predicate P is a constant type T, we obtain the recursion 
principle; if, furthermore, the recursion term u does not depend on the induction 
arguments, we obtain the iteration principle: 

T: * 2: @[xT]hu:T q T: * 2: @(T)Fu:T 
(y-rec [z]u): 1 > T a (pit [zju): T>T 
It is well known that the recursion and iteration principles are equivalent, whe- 
reas the full induction principle is a proper extension of them (see, for example, 
{11] or [20]). 

The types of natural numbers, binary trees, and lists over a type A can be 
defined as N := yux(Ni + X), where N, is the type with only one element 0); 
T := px(Ni+X xX); and List(A) := Nj +Ax X, respectively. Their constructors 
can be defined in terms of the single constructor j-intro: 


0 := (p-intro (inl 0,)), S := [n|(u-intro (inr n)); 
leaf := (p-intro (inl 0,)), node := [x1, x2](y-intro (inr (x1, 22))); 
nil := (y-intro (inl 0;)), cons := [a,l](p-intro (inr (a,l))). 
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The problem that we consider here is: Given a family of type operators 
&: A — (* — *) such that every element of it is strictly positive, can we 
construct the corresponding family of inductive types? Observe that it is not 
possible to characterize such families in a decidable way. In fact, for every func- 
tion f : N > N we can associate such a family: 


}:N- (* > *) 
me & if (f n) =0 
(on x)= e —+ X otherwise. 


Deciding whether every element of this family is strictly positive is equivalent 
to deciding whether f is constantly 0. Since, in type theory, every primitive 
recursive function on the natural numbers is definable, we would be able to decide 
whether any such function is constantly 0, which is notoriously impossible. 
The following section gives a decidable characterization of some of these 
families, which is wide enough for the examples that we are considering. 


3 Families of Inductive Types Defined by Strong 
Elimination 


Strong elimination is the elimination rule for inductive types in which the elimi- 
nation predicate is allowed to be big; that is, we can have an elimination predicate 
xz: It (P x): O. If is impredicative, strong elimination results in inconsistency 
(see [8]). Nevertheless, it can still be admitted if the inductive type I is defined 
without the use of impredicativity—that is, as already mentioned, if there is a 
type in C isomorphic to it. In such a case we allow strong elimination. This form 
of strong elimination is supported in Coq. We use strong elimination only in the 
form of iteration over the type *: 


Z: Bx) k WZ}: * 
(p-it [ZJW): I> * 


We recast Example 1 as 


X:*bF U(X]: N- * 
W = Ny 
Ws n) = N) + Xx. 


To be completely formal, we must write 


X:*,Z:N,+*- W[X, Z]: * 
:== Case Z of 
(inl) _) > Ny 
(ine RN SNI+XxR 
end 


X:*t WX] := (wit [Z]W): N > * 
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The desired family of inductive types is now specified by T, := ux(W%,). Unfor- 
tunately, this is not an allowed definition, since ¥ does not satisfy the strict- 
positivity condition: Although W,, reduces to a strictly positive operator for each 
numeral n, Y, does not if x: N is a variable. Therefore, we cannot define the 
family T: N > x, even if every member of it is individually definable. 


The case of term algebras over single-sorted signatures is similar. The opera- 
tor associated to a signature o := [aj,... ,@n] is Wo {X] := X% +---+ X%. We 
can define first a single component (n: N, X: * + X”: *) by strong elimination 
on the natural numbers and then WY, by strong elimination on List(N). The type 
of terms associated with the signature o is then Term, := ux(W,). As in the 
previous example this definition is not allowed in the standard implementation 
of inductive types, because YW, does not satisfy the strict-positivity condition. 


Our purpose is to find ways to define families of inductive types in type 
theory. The first step is a formal description of the problem—that is, an abstract 
characterization of the definitions we are looking for. If we consider the first of 
the preceding examples, we see that the reason why every single element of the 
family is strictly positive is that, in the recursive step of the definition, not only 
the type parameter X but the recursive call Y, (or R in the formalized version) 
also occurs only strictly positively. It is enough to require that all such recursive 
calls occur strictly positively. Note, however, that, in the formal version of the 
definition, the recursive call is Z, not R, which is just a bound variable in the 
Case construction. It doesn’t mean anything to say that Z occurs positively 
and the variable R does not actually occur in W (being bound). So we need a 
finer notion than strict positivity. The following concept of positive type pointer 
solves the problem. To understand it intuitively, consider a generic premise for 
a definition by strong elimination (X: *,Z: O[#] k U[X,Z]: *) where © is a 
strictly positive operator. We want to define first the family of type operators 
X:*b WX] := (wit [Z]U): wy(@[Y]) > * and then the family of inductive 
types x: py(O[Y]) + wx((W x)): x. Imagine Z represented as a tree structure 
whose leaves are types representing recursive calls. We must require that each 
occurrence of a term pointing to such a leaf occurs only strictly positively in U. 


Figure 1 is a depiction of type pointers. It shows how a type pointer can be 
constructed for each of the type constructors that build new strongly positive 
operators. If Z is in a product type (clause 3), it is a pair, represented by a 
binary node with a subtree for each branch. A positive type pointer first chooses 
one of the components and then uses a positive type pointer for that component. 
If Z is in a function type (clause 5), the situation is similar, but the number of 
components is equal to the cardinality of the domain type (possibly infinite). 
If Z is in a sum type (clause 4), then it is in one of the two forms specified 
by the component types. A positive type pointer must take into account both 
possibilities, so it prescribes a type pointer for each of the two components. 
Therefore, the picture for clause 3 shows two positive type pointers corresponding 
to the two components, whereas the picture for clause 4 shows only one positive 
type pointer that consists of two components. The picture for clause 8 shows how 
a positive type pointer is used in a recursive definition of a family of strongly 
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Clause 3: Z in a product type. 
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A positive type pointer 
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Clause 4: Z in a sum type. 


i qj 


Clause 8: a strongly positive family. 


oe Tt ge ame 


Clause 5: Z in a function type. 


Fig. 1. Illustration of Definition 2: the use of positive type pointers in the definition of 
families of strongly positive operators. 
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positive operators. The term W[X, Z] is the iterator of the recursive definition. 
It contains some direct occurrences of the variable X and some recursive calls, 
here indicated by the leaves T; and T; of the iteration variable Z. When T; 
and T; are replaced with the values of the recursive call, new occurrences of X 
appear. The requirement that W[X, Z], besides being strongly positive in X, 
is also a positive type pointer in Z causes al] the new occurrences of X to be 
strictly positive. 


Definition 2. A type operator (X: * + SX]: *) that can be lifted to kinds 
(X: 0 + @[X]: O) is strongly positive and a term (Z: [x] + U[Z]: *) is a 
positive type pointer for © if they satisfy the following recursive clauses. 


i 


If K is a type that does not depend on X (that is, X does not occur free in 
K), then ®|X| = K is strongly positive and Z: K + K: x ts a positive type 
pointer for ®. 


. If D|X| = X then @ is strongly positive and Z: * + Z: * is a positive type 


pointer for ®. 


. If B[X] = Bi[X] x So[X] and H, and G2 are strongly positive, then @ is 


strongly positive and if Z,: By [*] / Uy[Zi]: * and Zo: B2[*] + U2[Ze]: * are 
positive type pointers for ®; and 2, respectively, then 


Z: ®y(*| x Bole] - (Oy (my Z)): * and 
Z: By{x] x Bo[x] F (Ug (2 Z)): * 


are positive type pointers for B. 


. If BX] = &,[X} + 2[X] and $, and G2 are strongly positive, then @ is 


strongly positive and if Z;: B[*| + Uy[Z,]: * and Zz: G2[*] + Ug[Z2]: * are 
positive type pointers for ©; and G2, respectively, then 


Z: ®y|*| + Bo[*] + Case Z of (inl Z,) > Uy[Z\] [| (inr Z.) = Ue[Ze] end: « 


is a positive type pointer for ®. 


. If OX) = K > @'[X], where K is a type that does not depend on X, and 


©' is a strongly positive type operator, then © also is strongly positive and if 
Z': B® [x] | U(Z"|: & is a positive type pointer for B’, then, for every k: K, 


Z:K + O@'[K]EU(Z k)]: * 


is a positive type pointer for . 


. Suppose t: Ay + Ag for types A, and Ag: x. If |X] = (Case t of (inl x1) > 


(x1, X] | (inl x2) > Gy[x2, X] end) and d, and By are strongly positive, 
then @ also is strongly positive. 


. If ® and W are strongly positive operators and Z: ®[*| / U{Z]: * is a positive 


type pointer for ®, then Z: &[x| | W[U[Z]]: * also is a positive type pointer 
for ®. 


. If Y:* & O[Y]: * is a strongly positive type operator, I = py(O), and 


X:*,Z: O[x] | WLX, Z]: * is a positive type pointer for O with respect to Z 
and is strongly positive with respect to X, then every element of the family 
X:* + OX] := (p-it [Z]W[X, Z]): I > * is strongly positive; that is, for 
every i: I, X: *t (W[X] i): * is a strongly positive type operator. 
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Clause 7 may seem too restrictive because we do not consider the possibility 
that different positive type pointers for ® may be used in W. For example, if 
w|Y] = Y x Y, we may want to define the type pointer X: * + Uy[|X] x U9[X] 
where U, and U2 are different positive type pointers for &. In that case we should 
modify @ such that it becomes an operator on two parameters, Y;, Yo: * + Yi xYo, 
and then apply clause 7 twice, the first type substituting U;{X] for Y,, the second 
time substituting U2[X] for Y2. This can be done in all similar situations. 


We do not include a definition of positive type pointer corresponding to the 
strongly positive operator obtained in clause 8. This further complication is not 
necessary to define the families of types in which we are interested. 


The definition of strongly positive type operator coincides with the definition 
of strictly positive type operator but for the last clause, which allows the defini- 
tion of families of strongly positive type operators by recursion, using a positive 
type pointer as the recursion term. 


For example, consider the family of type operators defined in Formula (3). 
We want to prove that X: * + (W[X] n) is strongly positive for every n: N. 
Since N = py(O[Y]) with O[Y] = N, + Y, we can use clause 8 of Definition 
2. We have to prove that X: *,Z: Ny +* + W[X, Z]: * is a strongly positive 
type operator with respect to X and a positive type pointer for O with respect 
to Z. The first property follows from clause 6 and the easily verifiable fact that 
the two branches of the Case definition are strongly positive with respect to 
X (they are actually strictly positive). The second follows from clause 4 with 
®,[Y] = N, and U,[Z,] = Ni (positive type pointer by clause 1), 2[Y] = Y and 
U2[Z| = Ni +X x Zz, (positive type pointer by clause 7, with ¥[V] =N,+X xV, 
and clause 2). 

It follows that, in the system extended by Definition 2, we can define the 
family of inductive types T := [n: Njux(W[X] n): N —- «. 

Definition 2 does not add new inductive types to the system, but simply 
allows us to collect types in new families. 


Theorem 1. Every closed type ux(®) definable by Definition 2 is definable by 
Definition 1 also. 


Proof We must prove that every strongly positive operator X: * / S[X]: * in 
which no free variable except X occurs, is strictly positive (or, better, reduces 
to a strictly positive one). The proof is by induction on the number of times 
clause 8 of Definition 2 is used. We don’t need to consider the other clauses, 
since they are the same in the definition of strict positivity. Suppose then that 
@ has been obtained by clause 8—that is, = (W a), where W is as in clause 8 
and a is a closed term of type J. We assume that a is in normal form (otherwise 
we normalize it). We prove that (W a) is strictly positive by induction on the 
set of closed terms of J in normal form. (Note that this is structural induction 
external to type theory, and not an internal application of the elimination rule. 
This explains why a must be a closed term for it to work.) Suppose a = (j-intro b) 
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with 6: O[J] closed. Then 


(W a) = (W (p-intro b)) 
= (p-it [Z]W[X, Z] (u-intro 6)) 
~ WX, (Ol(u-it [Z]W[X, Z])] b)] 
= WX, (O[G[X]] })} 


The term (O[[X]] 6) can be represented as a tree isomorphic to the structure 
tree of a and whose leaves are in the form (YW c), with c an element of J struc- 
turally simpler than a. By induction hypothesis, for all recursive occurrences of 
elements of J in b (that is, the elements of J that are structurally simpler that 
a), the corresponding elements of the family W are strictly positive. Since (W a) 
is strictly-positively constructed from such occurrences by the type pointer W 
(this is the main property of the notion of positive type pointer and can be pro- 
ved straightforwardly for every clause of Definition 2), it is also strictly positive. 
ao 


4 Wellorderings 


In the previous section we proposed an extension of the notion of inductive type. 
We see now that, without extending type theory, we can encode the desired types 
and families as wellorderings. Wellorderings (also called W types) are types of 
trees specified by a type of nodes A and, for every element a of A, a type of 
branches (B a). This means that every node labelled with the element a has as 
many branches as the elements of (B a). 

Wellorderings were introduced by Martin-Lof [14,15] and used by Dybjer [10] 
to encode all inductive types obtained from strictly positive operators. Here we 
extend Dybjer’s construction to strongly positive operators. 


Definition 3. Let A: * and B: A — *. The type W(A, B) is defined by the 


rules 


formation W(A,B): 


a: A f:(Ba) > W(A,B) 
(sup a f): W(A, B) 


introduction 


elimination Let P: W(A, B) > +*, then 


za: A,y: (B 2) > W(A, B),z: (u: (B z))(P (y u)) 
F elx,y, 2]: (P (sup x y)) 
(W-ind [x,y, zJe): (w: W(A, B))(P w) 


conversion (W-ind [z,y, z]e (sup a f)) 
~ e(a, f,[u: (B a)](W-ind [z, y, zJe (f u))] 


Recursive Families of Inductive Types 83 


Wellorderings can be realized in type theory with the standard implemen- 
tation of inductive types. Using Formula 2 we can define the W constructor 
as 

inductive W [A:*,B:A— 4]: *:= 
sup: (x: A)((B x) > W(A, B)) > W(A, B) 
end. 


Dybjer showed in [10] that every strictly positive operator has an initial 
algebra constructed by a W type. This result holds if we take an extensional 
equality on the W type—that is, if we consider two elements (sup a, f,) and 
(sup a2 fo) of W(A, B) equal if a; and a2 are convertible and if (f; 6) = (fo 6) 
for every b: (B ay). In intensional type theory, which is the one we use, the 
second condition is not equivalent to the convertibility of f; and fo. For this 
reason, when we use W types, we have to deal explicitly with extensional equality. 
They are, therefore, more cumbersome than direct inductive definitions. Once we 
have stressed this drawback, we can extend Dybjer’s result to strongly positive 
operators. 


Theorem 2. For every strongly positive operator X: * | |X]: * there exist 
A: * and B: A — * such that W(A, B) is an initial algebra of &. (For a formal 
definition of initial algebras of type operators see, for example, [11] or [20].) 


Proof The proof is by induction on the structure of @ as in Dybjer [10]. Our 
Definition 2 contains two extra clauses that are not. present in Dybjer’s definition: 
clauses 6 and 8. Let us see how Dybjer’s proof can be extended to include them. 

Clause 6 is easily treated by defining A and B by cases on the term ¢ in 
the definition of YW and using the recursive results for the branches of the Case 
expression. (See the following example.) 

If W is obtained by clause 8 we define the families A: J > * and B: (x: INA, > 
* by recursion on J. Given x = (y-intro y): J, we assume by inductive hypothesis 
that A and B are defined for all the recursive occurrences of elements of J in y. 
We define the new A, and B,, by using Dybjer’s construction for the occurrences 
of X and of the recursive calls Z on W[X, Z]. Formally, using Dybjer’s method 
recursively on the clauses of Definition 2, we can construct from W two families 
of operators W, and Wg and then apply the iteration principle to obtain A and 
the induction principle to obtain B: 


ZA: O[x] b Wa[Za]: * 
A= (wit (ZalWa): D> 


Zp: OL I (2: IA; > *)] / W[Zp): A(u-intro (@[m] z)) > * 
Be= (y-ind [Zp|We): (2: TA; —> * ; 


Note the difference with the proof of Theorem 1: The assumption that a 
closed term a: I is used was essential to that proof. That was necessary because 
we were proving an external predicate. But here we are constructing families 
of types internal to type theory, therefore we can use the elimination rule of 
type I to construct A (with elimination predicate Pa = |x: I|*) and B (with 
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elimination predicate Pg = [x: I|A, — *. Therefore A, and B, are defined also 
for a free variable x: I. 0 

This construction gives, in the case of the family of operators of Formula 3, 
the following families of As and Bs: 


A: N-— x B: (n: N)An — * 
Ag = Ni (Bo & ) = @ 
As n) i= N, + An (Bis n) (inl -)) = 0 
(= N, +N, x An) (Bis n) (inr a)) = N, + (Bn a) 


The W construction for terms over a signature in Sig is described in [6], where 
it is extended to many-sorted signatures. 


5 Recursive vs. Inductive Families 


We remarked that the W construction has the disadvantage that extensionally 
equal terms are not always convertible. This is unavoidable when we use transfi- 
nite types, but it could and should be avoided with finitary types. The solution 
proposed in this section exploits an extension of inductive types implemented in 
the proof tool Coq (see {12]). This consists in extending the notion of strict po- 
sitivity to that of positivity by a clause that allows operators to inherit positive 
occurrences of a parameter X from inductive definitions. 


Definition 4. A type operator X: * + W[X]: * is positive if it satisfies the 
clauses of the definition of strict positivity where we substitute “positive” for 
“strictly positive” everywhere, and the new clause 


X is positive in (J t,---tm) if J is an inductive type and, for every term 
t;, either X does not occur int; or X is positive in t;, t; instantiates a 
general parameter of J and this parameter is positive in the arguments 
of the constructors of J. 


To apply this construction to our case we first need to replace the recursive 
definition of a family of type operators with an inductive one. We illustrate the 
method with the example of Formula 3. The family Y was defined by recursion 
on the natural numbers. Instead we use the following inductive definition 


inductive ind(W)[X: *]: N> *:= 

vo: ind(W)o 

wy: (n: N)ind(Y)¢s ny 

wo: (n: N)X — ind(W)n > ind(W) 5 ny 
end. 


(The constructors 7% and %, could be unified in a single constructor 91: 
(n: N)ind(W),,, but we keep them separate to keep the parallel with the defi- 
nition of Y in Formula 3.) X is a general parameter of ind(W) and it is positive 
in the arguments of the constructors: It appears only as the type of the first 
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argument of the constructor 72. It follows from the clause in Definition 4 that 
X:* + (ind(W) X n) is a positive type operator for every n: N. In the type 
system of Coq such positive operators can be used in the definition of inductive 
types, thus the family T := [n]ux((ind(W) X n)): N > * is admissible. Note 
that the condition expressed in clause 8 of Definition 2 by requiring W to be a 
positive type pointer corresponds to the fact that the recursive calls must occur 
positively in the definition of ind(W). This translation can be done in general for 
every strongly positive operator. 


Theorem 3. For every strongly positive type operator X: * + @[X]: * there 
exists a positive type operator X: * + ind(®)[X]: * such that, for every type 
X: *, BX] & ind(S)[X]. 


Proof As usual the relevant case is clause 8 of Definition 2. If X: «+ W[X]: I> 
* is defined as in that clause, then we replace it with the inductive family 


inductive ind(W) [X: *]: I > * 
. (y: O[T))WI[X, (Olind(H)] y)] > ind(Y) (u-intro y)s 
en 


which can be proved to be positive according to Definition 4, by induction on 
the proof that W is a positive type pointer. The general parameter X occurs 
only positively in the arguments of the constructor 1 because it occurs only 
positively in W (by induction hypothesis). 

With this translation we get always inductive families with only one construc- 
tor. In practice it is intuitively easier to break it down into several constructors, 
as we did in the preceding example. 0 


6 Applying the Two Level Approach to Inductive Types 


The two level approach is a technique used for proof construction in type theory. 
A goal G is lifted to a syntactic level; that is, a term g, of a type Goal: * 
representing goals, is associated to G. Logical rules are reflected by functions or 
relations on Goal. To prove G we apply the functions or work with the relations 
on g. Once g is proved at the syntactic level, we can extract a proof of G. 

The technique is described in [4] and in Ruys’ thesis [22]. It was used by 
Boutin, who calls it reflection, to implement the Ring tactic in Coq [5]. Its 
furthest application consists in formalizing type theory inside type theory itself 
and use it to do metareasoning. This was partially done by Howe in {13] for 
Nupr! and by Barras and Werner in [3] for Coq. 

We apply it to inductive definitions. First of all we define a type of codes for 
positive type operators PosOp. To every element ¢: PosOp we associate a positive 
type operator (TypeOp ¢): * -> * and an inductive type (IndType ¢): *, using 
the technique of Section 5. Then we define an inductive predicate Positive on 
type operators, which is an internalization of the notion of strict positivity (note 
that we de uot need to internalize strong positivity or positivity). We define 
2 function that associates an element of PosOp to every operator @: * + * 
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and proof p: (Positive ). So we can define an inductive type by proving that 
the corresponding type operator is strictly positive. This can be done for the 
families of operators defined by recursion, hence solving our initial problem. 


Definition 5. The type PosOp: 0 ts defined by the following introduction rules: 


K: op;: PosOp ope: PosOp K: * op: PosOp 
(op-const kK): PosOp (op-prod op, op2): PosOp (op-fun K op) 


op,: PosOp  op2: PosOp 
op-id: PosOp (op-sum op; op2): PosOp’ 


We can associate an actual type operator to every element of PosOp, by recursion 
on it: 


TypeOp: PosOp - * > * 
(op-const kK) => [X:4|K 
op-id => [X: *]X 
(op-prod op; op2) => [X: *](TypeOp op, X) x (TypeOp op: X) 
(op-sum op; opz) => [X: *](TypeOp op; X) + (TypeOp opp X) 
{op-fun K op) = [X: *|K — (TypeOp op X). 


Unfortunately this approach leads us to a dead end, since the family of operators 
TypeOp is strongly positive but not positive, being obtained by recursion. 

We apply instead the technique of Section 5 to transform TypeOp from a 
recursive family to an inductive one satisfying the positivity condition: 


inductive IndOp [X: *]: PosOp — * 

Coonst : (KX: *)K -> (IndOp (op-const K’)) 

Cia : X —> (IndOp op-id) 

Cprod 2 (OP1, 0P2: PosOp) (IndOp op) > (IndOp op) 

—» (IndOp (op-prod op; ope)) 

Csum,| : (0p1,0p2: PosOp)(IndOp op) + (IndOp (op-sum op; op2)) 

Csum,r : (OP1,0P2: PosOp)(IndOp op2) — (IndOp (op-sum op; op2)) 

Cfun : (K: *)(op: PosOp)(K -> (IndOp op)) -+ (IndOp (op-fun K op)) 
end. 


Lemma 1. For every op: PosOp, X: * + (IndOp X op) is a positive operator. 


Proof Just check that the requirements of the new clause in Definition 4 are 
satisfied. Oo 

Thus, in the type system of Coq, we can associate an inductive type to every 
element of PosOp: 


IndType := [op: PosOp]u.x(IndOp X op): PosOp —> «. 


Whenever we have a family of type operators X: x | W[X]: I > * defined by 
recursion on an inductive type J, we can associate to it a function fy: J — PosOp 
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and obtain the family of inductive types as [x: I](IndType (fy x)). For example, 
the family W from Formula 3 is translated into the function 


fu: N - PosOp 
(fo 0 ):= (op-const N;) 
(fu (S n)) := (op-sum (op-const N,) (op-prod op-id (fy 7))) 
Moreover, we can avoid this translation by proving directly the positivity of 
the original operators inside type theory. 
Definition 6. The predicate Positive: (x + *) > * is inductively defined by the 
following rules: 
K: x 
(pos-const KA’): (Positive |X: *|.A’) 


pos-id: (Positive |X: *|X) 


Pi: *—>* Gy: *—+* py: (Positive B)) po: (Positive 2) 
(pos-prod ©; Bz py po): (Positive |X: *](H, X) x (G2 X)) 


©): * > * Gy: *->* py: (Positive P)) po: (Positive 2) 
(pos-sum ©; By p; po): (Positive [X: *](@; X) + (hy X)) 


K: * @:*—>* p: (Positive &) 
(pos-fun K ® p): (Positive [X: *]K — (@ X)) 


It is straightforward to define a function pos-code: (@: * > *){Positive ) > 
PosOp by recursion on the proof of (Positive &). In conclusion, given a recursive 
family of operators, we can prove by induction that every element of the family 
is positive and then obtain the recursive family of inductive types by composing 
IndType and pos-code. 


Lemma 2. Every type operator ®: * > * such that (Positive ©) is provable, has 
an initial algebra. 


Finally we can apply this method to the strongly positive operators. 
Theorem 4. If X: * | &[X]: * is a strongly positive operator, then there is a 
proof of (Positive [X]®[X}). 

Proof We just formalize the proof of Theorem 1. Since we are now developing 
the proof inside type theory, the requirement that no free variable except X 
appears in ® is no longer necessary. Hence the result holds for every strongly 
positive operator. 


7 Conclusion 


We have considered the problem of defining families of inductive types whose 
constructors are given by recursion. These families occur naturally in some de- 
velopments of abstract mathematics in type theory. We characterized them with 
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the notion of strong positive operator. We described a model of them in type 
theory that uses wellorderings. We showed that a more manageable model can 
be constructed in a type theory with an extended notion of inductive definition. 
Finally we generalized the later model to a complete internalization of inductive 
definitions. This last part was completely formalized in Coq. 
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Abstract. The Airborne Information for Lateral Spacing (AILS) pro- 
gram at NASA Langley Research Center aims at giving pilots the infor- 
mation necessary to make independent approaches to parallel runways 
with spacing down to 2500 feet in Instrument Meteorological Conditi- 
ons. The AILS concept consists of accurate traffic information visible on 
the navigation display and an alerting algorithm which warns the crew 
when one of the aircraft involved in a parallel landing is diverting from 
its intended flight path. In this paper we present a model of aircraft 
approaches to parallel runways. Based on this model, we analyze the 
alerting algorithm with the objective of verifying its correctness. The 
formalization is conducted in the general verification system PVS. 


1 Introduction 


The Airborne Information for Lateral Spacing (AILS) [12,3,6] is a project being 
conducted at NASA Langley Research Center. Its objective is to reduce traffic 
delays and increase airport efficiency by enabling approaches to closely spaced 
parallel runways in Instrument Meteorological Conditions. 

Approaches to parallel runways are currently limited to 4300 feet in Instru- 
ment Meteorological Conditions. Specially equipped airports with fast scan ra- 
dars, high resolution monitoring systems, and approach-specific air traffic con- 
trollers can perform parallel approaches to 3400 feet [14,8]. The AILS project 
aims at shifting the responsibility of maintaining separation during parallel ap- 
proaches from the air traffic controller to the aircraft crew. Via the AILS concept, 
approaches to parallel runways 2500 feet apart in Instrument Meteorological 
Conditions are expected. 

AILS eliminates the delay inherent in the communication between air traffic 
controller and crew by displaying parallel traffic information in the cockpit. The 
degree of safety is enhanced by an alerting system which warns the crew when 
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one of the aircraft involved in a parallel landing is deviating from the intended 
flight path. The alerting algorithm is a critical part of the AILS concept. Flaws 
in its logic could lead to non-alerted collision incidents. The algorithm has been 
extensively tested in simulators and in real flights. 

The objective of this work is to conduct a formal analysis of the alerting 
algorithm in order to discover any possible errors that have not been detected 
during testing and simulation. In particular, we develop a formal model of par- 
allel landing scenarios. Based on this model, we study the behavior of the AILS 
alerting algorithm with respect to collision incidents. In particular, we have fo- 
und maximum and minimum times when an alarm will first sound prior to a 
collision. Indeed, we have proven that for any trajectory leading to a collision, 
an alarm is issued at least 4 seconds before the collision. Conversely, we have 
found that there exist trajectories leading to a collision where the alarm will not 
sound before 11 seconds. We believe that for all cases the largest time prior to 
a collision when the alarm will first sound is closer to 11 than to 4. 

The paper is organized as follows. First, in section 2, we shortly review the 
alerting features which are integrated in the AILS concept. Next, in section 3, 
we describe in detail the AILS alerting algorithm. We model aircraft trajectories 
and collision scenarios in section 4. Section 5 contains the main properties that 
we have formally proven. Finally, we conclude with some remarks in section 6. 
The formalization presented in this paper has been developed in the general 
verification system PVS [11]. We use a stylized-IAT@X PVS concrete syntax and 
assume the reader is familiar with standard notations of higher-order logic. 


2 System Description 


Two Dot Deviation 


One Dot Deviation - - - q- ------------------------------------ ee ee 
Localizer Track am 


Fig. 1. Parallel runway approach 


92 V. Carrenio and C. Mufioz 


In a typical independent parallel approach, depicted in Figure 1, aircraft in- 
tersect their localizer track (longitudinal runway center) approximately 10 nau- 
tical miles from the runway threshold. During localizer intersection, aircraft have 
a 1000 feet vertical separation. After the aircraft are established in their localizer 
track, vertical separation is eliminated and aircraft start a normal glide path for 
landing. 

The AILS alerting system starts operating when the aircraft are on their 
localizers. At this time the aircraft are approximately at the same altitude. As 
explained later, one aircraft is assumed to be the intruder and the other is assu- 
med to be the evader. The scenario is then reversed. When the intruder aircraft 
deviates from its airspace, the AILS system provides 6 alert levels, depending 
on the severity of the deviation. Table 1 shows an alerting sequence as seen in 
the evader and intruder aircraft primary and navigation displays. 


Table 1. Alerting sequence 


Evader Intruder 
Localizer alert (one dot deviation) 
Localizer alert (two dot deviation) 


Caution alert (traffic) 
Caution alert (traffic) 


1 
2 
3 
4 
5} Warning alert (collision) 
|6| Warning alert (collision) = = s—s—s—s—siSY 


All alerts in the intruder aircraft are expected to be followed by a corrective 
maneuver. The evader aircraft is not expected to perform an evasive maneuver 
until a warning alert is issued, at which time landing is aborted and an emergency 
escape maneuver is performed. Notice that the intruder aircraft always receives 
a caution or warning alert before the respective caution or warning alerts are 
issued to the evader. 

An algorithm implementing the alerting features explained above runs in- 
dependently on each aircraft. It runs twice every 0.5 seconds. The first time 
the algorithm assumes that the own-ship is the intruder aircraft and the adja- 
cent aircraft is the evader. In the next iteration the algorithm assumes that the 
own-ship is the evader and the adjacent aircraft is the intruder. 

Several assumptions were made by the AILS project researchers in the deve- 
lopment of the alerting algorithm. These assumptions are justified by physical 
characteristics and operational constraints. They are as follows: 


— Time is discrete and divided in increments of 0.5 seconds. In our model, we 
call this value tstep. 

— The rate of turn is determined by the bank angle and ground speed. 

— The speeds of the aircraft are constant. Henceforth, we use intruderSpeed 
and evaderSpeed as the constant speed values of the intruder and evader 
aircraft, respectively. 
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- The vertical separation between the aircraft is assumed to be 0 during a 
landing approach. 

— Only the intruder aircraft will deviate from its path in a parallel approach. 
The evader aircraft is assumed to stay in its localizer with a heading angle 
of 0°. 


It should be noted that the experimental AILS system, as currently designed, 
forms part of the Traffic Alert and Collision Avoidance System (TCAS) [13]. In 
this work, we assume that the AILS alerting algorithm is running in isolation 
from other aircraft components. In addition, we concentrate on the caution and 
warning alerting kernel of the AILS alerting system. The one dot and two dot 
deviation alerts present a simple scenario and can be easily added to our model 
by a separate function as it is done in the current implementation. 


3. The AILS Alerting Algorithm 


In this section we describe the alerting algorithm. We start in subsection 3.1 with 
a detailed, but informal, description of the actual algorithm. Then, in section 3.2, 
we abstract and formalize it in the PVS specification language. 


3.1 Detailed Description 


The alerting algorithm determines when an alarm will be triggered by calculating 
possible collision trajectories and comparing the future aircraft locations with 
predetermined time and distance thresholds. The algorithm is executed in two 
modes every tstep seconds: (1) the first mode assumes its own aircraft is a threat 
to the adjacent aircraft and the adjacent aircraft is following the localizer; (2) 
the second mode assumes the adjacent aircraft is a threat to its own and the 
own is following the localizer. In either mode, one aircraft is the intruder and 
one is the evader. 

The algorithm considers two cases depending on whether the intruder is 
changing direction or not. When the intruder aircraft is not changing direction, 
ie., its bank angle is 0, the algorithm determines if the two aircraft are diverging 
or converging and the point of closest separation. This is done by obtaining the 
derivative of the distance between the aircraft and solving for time when the 
derivative equals zero as follows. 


A,(t) = Lin (t) ae Sel t) (1) 

A, (t) = Yin(t) — Yeu(t) (2) 

£ Ac(t) = intruderSpeed x cos(@) — evaderSpeed (3) 
S Ault = intruderSpeed x sin(6) (4) 


R(t) = \/ c(t)” + Ay(t)” (5) 
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4 pay _ Ax(t) x £A,(t) + Ay(t) x £A,(t) (6) 
dt R(t) 

For a time t, (%in(t), yin(t)) and (Zev(t), Yev(t)) are the coordinates of the 
intruder and evader aircraft, respectively, and @ is the heading angle of the 
intruder aircraft. When £R(t +7) = 0, we get the time 7, relative to t, of the 
point of closest separation of the aircraft. Time 7 has been calculated as 


FOr _ Ac(t) x 4A,(t) + Ay(t) x 4, Ay (t) 
GAx(t)? + HAy(t)? 

Equations 3, 4, 6, and 7 were formally deduced by using the computer algebra 
tool MuPAD [4]. Notice that 7 is undetermined when the aircraft are parallel and 
the ground speeds are equal. In this case, the alerting algorithm defines 7(t) = 0 
for any t. Since the evader aircraft is assumed to stay in its localizer with a 
heading angle of 0°, it does not have a y-speed component. This is reflected in 
Equation 4. 

For a time t, if 7(t) is negative or zero, the tracks are diverging or parallel, 
respectively. If v(t) is greater than zero, the tracks are converging and 7(t) will be 
the time of closest separation (Figure 2). When tracks are diverging or parallel, 
the algorithm checks the aircraft separation at the present time against the 
threshold distance for warning or caution alert. When tracks are converging, the 
algorithm compares the time and distance of closest separation against time and 
distance thresholds, respectively. In either case, an alarm is triggered when the 
calculated time and distance are within the time and distance alert thresholds. 


(7) 


2 ot 
= = Sener Te = = = R | vo 
P+ Baar eee ee A eS, 
} ; / gee * 4 7 
| i ge | 
re se S 
on Closest separation eZ 


Fig. 2. Converging tracks 


When the intruder aircraft is changing direction, i.e., its bank angle is not 0, 
the algorithm calculates the radius of the turn and the rate of change of direction. 
Tangential tracks are calculated from the arc path as to produce tangents which 
are 1.5° to 3° in angular separation (Figure 3). For each of these tangential 
tracks the algorithm determines whether the two aircraft tracks are diverging or 
converging and performs time and distance comparisons as explained above. 


3.2 PVS Abstraction 


The original AILS algorithm was written in FORTRAN at Langley Research 
Center. It has been revised several times and the latest version flown in the 
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circular Tangent 
arc path Tracks 
at current 

bank angle __ 


1.5 to 3 degrees 


Fig. 3. Radial trajectory and tangential tracks 


Boeing 757 experimental aircraft was provided by Honeywell. For the work pre- 
sented in this paper, we created a high level abstract model of the alerting algo- 
rithm in the PVS language. The algorithm model uses the same strategy as the 
FORTRAN algorithm to determine if alarms are triggered, as explained above. 
All of the PVS declarations involved in the modeling of the algorithm can be seen 
in the theory file available at http: //shemesh.larc.nasa.gov/people/vac/ 
ails.pvs. 

The model of the algorithm is a function which takes the states of the aircraft 
and returns a Boolean value corresponding to whether the alarm is triggered 
or not. The type of the alarm, caution or warning, depends on the threshold 
parameters. However, we only consider a generic type of alarm which abstracts 
from warning and caution alarms. The state of an aircraft is defined by a record 
with fields x, y: the position coordinates; heading: the angles between the flight 
path and the localizer track; and bank: the bank angle which range between 
—45° and 45° (type Bank). In PVS: 


Bank : TYPE = {r:real | -45 < r < 45} 


State : TYPE = 


{# x : real, 
y : real, 
heading : real, 
bank : Bank 

#) 


Access to records can be written in PVS as function calls, i-e., if s is a State, 
x(s) refers to the field x of the state s. 
The model of the alerting algorithm is given next. 
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larcalert (intruder,evader:State): bool = 
LET phi = bank(intruder) IN 
LET trkrate = gx (180/7) xtand(phi)/intruderSpeed IN 


IF trkrate = 0 THEN % Direction is not changing. 
chktrack (intruder, evader ,0) % Check strait tracks. 
ELSE % Direction is changing 
LET arcrad = % Calculate arc radius. 
intruderSpeed”/(gxtand(phi)) IN 
LET idtrk = 
IF abs(trkrate) > 3 THEN 1 % This determines 
ELSIF abs(trkrate) > 1+1/2 THEN 2 % how often 
ELSIF abs(trkrate) > 3/4 THEN 4 % tangential 
ELSE 8 % tracks are 
ENDIF IN % calculated. 
arc_loop(intruder,evader,arcrad,trkrate,idtrk,0) 
ENDIF 


where g is the gravitational acceleration constant (approx. 32.2 feet /seconds?). 

The first part of the function larcalert is exercised when the track rate 
(trkrate) is zero and there is no change in the intruder’s heading. In that 
case, the function chktrack makes the calculation for converging or diverging 
tracks, according to Equations 1 to 7. If the tracks are diverging, the function 
chkrange is called to compare present locations against time and distance thres- 
holds (alertTime and alertRange, respectively). If the tracks are converging, 
predicted locations at caution time or time of closest separation, whatever is 
smaller, are compared. An alarm is issued when calculated time and distance 
values are within the range of time and distance alert thresholds. 

The structure of the definitions of chkrange and chktrack are given next. 


chkrange(range,t:real): bool = 
range < alertRange A t < alertTime 


chktrack(intruder,evader:State,t:real): bool = 
LET range = R(t) IN 
LET tau = 7(t) IN 


IF tau < 0 THEN % Tracks are diverging (or parallel). 
chkrange (range,t) % Check range at prediction time t. 
ELSE % Tracks are converging. 


IF t+tau > alertTime THEN % Closest separation beyond alert time. 
R(alertTime) < alertRange % Check range at alert threshold. 


ELSE % Closest separation within alert time. 
R(t+tau) < alertRange % Check range at time of 
ENDIF % closest separation. 
ENDIF 


The second part of the function larcalert handles the case when the intru- 
der is changing direction. The arc radius is calculated and the function arc_loop 
generates the tangential tracks from the arc trajectory. The function arc_loop 
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is a recursive function modeling a DO-LOOP statement. It is used to iterate 
the function chktrack on tangential tracks every idtrk time steps. The actual 
definition of arc_loop is too long to be included in the paper and can also be 
seen in the theory file as pointed above. The structure of the function is: 


arc_loop(intruder,evader ,arcrad,trkrate,idtrk,iarc): RECURSIVE bool = 
IF iarc = MaxStep THEN FALSE 
ELSE 
calculate positions of aircraft 
IF not ttme for a tangential track THEN 
IF chkrange(...) THEN % Check range at that point. 


TRUE % Trigger an alarn. 
ELSE 
arc_loop(...,iarc+i) % Go to new iteration. 
ENDIF 
ELSE % Time for tangential tracks. 
IF chktrk(...) THEN % Check track at this point. 
TRUE % Trigger an alarn. 
ELSE 
arc_loop(...,iarc+1) % Go to new iteration. 
ENDIF 
ENDIF 


ENDIF 


Based on the idtrk argument and the step in the loop iarc, the function 
arc_loop determines if a tangential track is calculated or not. If a tangential 
track is not calculated, the function chkrange compares the distance between 
the calculated positions of the aircraft and the distance threshold. The function 
chktrk is used to check for collisions on all the tangential tracks in the loop. The 
function arc_loop terminates when one of the functions chkrange or chktrack 
triggers an alarm or when iarc has reached a constant MaxStep defined as 
alert_time/tstep. 

In the PVS model, we are using an axiomatic definition of the square root 
function (sqrt, see section 5). Trigonometric functions (sind, cosd, and tand, 
for sine, cosine, and tangent of angles in degrees, respectively) are defined by 
series approximations. However, as we will see in section 5, we also provide 
axioms about trigonometric functions to facilitate the proofs. 

As we have seen, the AILS algorithm considers a limited set of possible tra- 
jectories for the intruder aircraft, i.e., assuming a constant radius turn at the 
original bank angle, only tangent track escapes to the turn arc are considered. 
The developers of the algorithm state that this assumption is reasonable under 
normal circumstances, i.e., the intruder aircraft is not intentionally trying to 
collide with the evader aircraft. However, to evaluate the behavior of the algo- 
rithm in a wider range of possible landing scenarios, a more general model of 
trajectories for the intruder aircraft is necessary. In the next section, we develop 
such a model. 
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4 Parallel Landing Scenarios 


According to the characteristics and assumptions of the AILS algorithm, we 
propose a time-discrete model of trajectories with time increments of tstep 
seconds. In that model, as in the case of the alerting algorithm, intrusion paths 
are determined by the bank angle and ground speed of the intruder aircraft. 
Given a ground speed gs > 0, a bank angle ¢, the heading turn rate is given by 


1 
trkrate(gs, 6) = aaa 


where g is the gravitational acceleration constant. 

Although under normal operation the bank angle of a commercial aircraft 
is limited to —30° to 30°, we allow the bank angle to range from —45° to 45°. 
For a minimum ground speed of 180 feet per second, it means a maximum 
heading turn rate of about 6° per second. These values produce very aggressive 
blundering situations quite consistent with worst cases scenarios tested by the 
AILS developing group. Incidentally, the function trkrate is well-defined for 
bank angles in that range. 


Definition 1 (Intruder trajectory). An intruder trajectory of length n for 
an aircraft with state s and ground speed gs is a sequence of states ing...inn 
such that ing = 8s and for0 <i<n, 


1. |heading(in;) — heading(in;_1)| = tstep x trkrate(gs, bank(in;)), 
2. a(in,) = x(iny_1) + gs x tstep x cosd(heading(in;)), and 
3. y(ing) = y(ini_1) + gs x tstep x sind(heading(in;)). 


In PVS, we define the next state of an intruder aircraft at state s and bank 
angle ¢ by the function 


next_intruder_state(s:State,¢:Bank): State = 


s WITH [ 
x := x(s) + intruderSpeedxtstepxcosd(heading(s)), 
y = y(s) + intruderSpeedxtstepxsind(heading(s)), 
heading := heading(s) + tstepxtrkrate(intruderSpeed,bank(s)), 
bank =@ 

] 


The notation WITH is the record (and function) overriding operator in PVS. 

We model an intruder trajectory by a recursive function having as parameters 
an initial state s, a bank angle assignment for each iteration step tr, and the 
iteration step n, as follows 


intruder_trajectory(s:State, tr:[posnat — Bank], n:nat): 
RECURSIVE State = 
IF n = 0 THEN s 
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ELSE 

next_intruder_state(intruder_trajectory(s, tr, n-1),tr(n)) 
ENDIF 
MEASURE n 


For example, given an intruder aircraft at initial state s and bank angle equal 
to 0, a trajectory of length n such that the plane follows a straight line to its 
current heading angle is given by ing...inn, where ing = s and for0 <i<n, 


in; = intruder_trajectory(s, A(n : posnat) : 0,2). 


For the evader aircraft, we assume that it stays in its localizer with a constant 
speed and constant heading of 0°. Heading and bank angles are irrelevant in the 
definition of an evader trajectory. 


Definition 2 (Evader trajectory). An evader trajectory of length n for an 
aircraft with state s and ground speed gs is a sequence of states evg...€Un such 
that evg = s and for0 <i<n, 
1. x(ev;) = x(evj_1) + gs x tstep and 
2. y(evj) = y(evo). 
For an initial state s of an aircraft, its state after n steps in a evader trajectory 
is defined by evader_trajectory(s,n) as follows 


evader_trajectory(s:State, n:nat): State = 


(# 
x = x(s) + evaderSpeedxtstepxn, 
y = y(s), 
heading := heading(s), 
bank = bank(s) 
#) 


We are interested in trajectories leading to collision incidents. Aircraft are 
said to be in collision if the distance between them is less than or equal to 
collisionRange. In our development, we consider 200 feet for collisionRange, 
which is approximately the wing span of a Boeing 747. 


distance(si,s2:State): real = 
sqrt ((x(s2)-x(s1))? + (y(s2)-y(s1))*) 


collision(si1,s2:State): bool = 
distance(si,s2) < collisionRange 


Definition 3 (Collision scenario). Given an intruder trajectory ing ...inn 
and an evader trajectory evp ... €Un, we said that they lead to a collision incident 
at step i, forO <i<n, if collision(in;,ev;) holds. 
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A collision scenario is defined in PVS as follows 


collision_scenario(intruder,evader:State, tr: [posnat — Bank], 
i:nat):bool = 
collision(intruder_trajectory(intruder,tr,i), 
evader_trajectory(evader,i)) 


We have implemented the model of trajectories, together with our high-level 
version of the alerting algorithm, in Java. The implementation, available in the 
same location as the PVS theory files, serves a double purpose. First, it allows us 
to graphically visualize all the collision trajectories for a given time and initial 
values of the intruder and evader aircraft. Trajectories are difficult to visualize in 
PVS given the huge amount of data generated as output by the model. Second 
and more importantly, by studying those trajectories, we were able to extract 
conjectures that we have then formally proven in PVS. Conversely, as we will 
mention later, we have rejected some conjectures by finding counter-examples 
via simulation of collision trajectories, 

In the next section, we formally study in PVS the behavior of the alerting 
algorithm with respect to our model of collision trajectories. 


5 Main Properties 


The objective of this modeling and verification work is (1) to show that the me- 
thod implemented in the algorithm to predict trajectories and trigger alarms is 
adequate and does not lead to dangerous situations, and (2) to explore possible 
trajectory scenarios which lead to unacceptable risk. To this effect we created 
models of the algorithm and aircraft trajectories in PVS, created simulations 
in JAVA to graphically visualize the behavior and characteristics of the lan- 
ding scenario, and derived in the computer algebra tool MuPAD equations of 
section 3. 


5.1 Axioms on Continuous Mathematics 


Before stating the main properties, it should be said that most of the proofs 
require reasoning on continuous mathematics. We have assumed some uninter- 
preted functions and axioms in PVS, for instance 


sqrt(x:real) : {z:real | z* = x and z > 0} 


sin_cos_sq_one : AXIOM 
V (x:real): sind(x)? + cosd(x)? = 1 


More involved properties, grounded on Equations 1 to 7, are also necessary, 
@£g., 
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derivative_eq_zero_min : AXIOM 
V (ti,t2:real): R(ti+7r(t1)) < R(t1+t2) 


decrease_zero_to_tau : AXIOM 
V (t,ti,t2:real) : 
T(t) > OA t2 < rlt) A t1 < t2 
=> 
R(t+t1) > R(t+t2) 


increase_tau_to_zero : AXIOM 
VY (t,t1,t2:real) 

T(t) < OA t2 > T(t) Atl > t2 

= 

R(t+t1) > R(t+t2) 

Axiom derivative_eq_zero.min states that at time ¢, 7(t) would be the 
time of closest separation between the aircraft. Axioms decrease_zero_to_tau 
and increase_tau_to_zero state that function R asymptotically decreases for 
times less than 7(t) and asymptotically increases for times greater than r(t), 
respectively. 


5.2 Finding a Time Prior to a Collision 


Our intention is to show that for all aircraft trajectories which lead to a collision 
and all initial states', an alarm is issued time seconds before a collision. In our 
formal development, we have found maximum and minimum bounds for the 
values of time. 

In first place, we have proven that an alarm (it can be caution or warning) 
is triggered when the distance between the aircraft is within the alerting range 
(alertRange). This property holds independently of the values of any other state 
variables of the aircraft. 


alarm_when_alerting distance : THEOREM 
V (evader,intruder:State) : 
alerting _distance(evader,intruder) = larcalert(evader, intruder) 


The theorem above establishes the largest lower bound on the elapsed time 
between an alert and a collision that we have found so far. For an alerting 
distance of 1400 feet and an intruder ground speed of 250 feet per second this 
results in an alarm at least 4 seconds before collision. 

An effort to prove that a caution is issued for a value of (alertTime-1) 
(alertTime being defined as 19 seconds) failed. Indeed, we have found a collision 
trajectory which allows two aircraft to fly from a 2500 feet y-separation to a 
distance of less than 1900 feet, without triggering an alarm 11 seconds before 
the collision. 


1 Recall from section 2 that initial states are when the aircraft are on their localizers. 
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move_2500_to_1900_no_alarm_before_11i_seconds : THEOREM 
jd (intruder,evader:State, tr: [posnat — Bank], n:nat) 
collision_scenario(intruder,evader,tr,n+11/tstep) /A 
abs (y(intruder)~y(evader)) = 2500 A 
distance (intruder_trajectory(intruder,tr,n), 
evader_trajectory(evader,n)) < 1900 A 
V (i: f[0...n]): 
7 larcalert (evader_trajectory(evader,i), 
intruder_trajectory (intruder ,tr,i)) 


Intruder and evader trajectories that satisfy the above property are ing...inn, 
€Up ..-€Un, where 


ino = Gt x := 860, y := 0, heading := 3, bank := 0 #) 
evg = (# x := 0, y := 2500, heading := 0, bank := 0 #) 
tr = A(n: posnat) : IF mn < 122 THEN 0 ELSE 45 ENDIF 


and for0 <i<n, 


in, = intruder_trajectory(ing, tr, 7) 


ev; = evader_trajectory(evg, 7) 


By combining these theorems, we can state that (1) there is a trajectory for 
which an alarm will not sound before 11 seconds and (2) for all trajectories an 
alarm will sound at least 4 seconds before a collision. We believe that for all cases 
the largest time prior to a collision when the alarm will first sound is closer to 
11 than to 4. 


5.3 Closing the Gap 


In order to find a largest time prior to a collision, we need to find strong invariants 
on collision trajectories. Notice, for example, that for an intruder trajectory 
ing... and an evader trajectory evg...eUn, it cannot be the case that they 
lead to a collision incident at step n when distance(ing,ev,) > R, where 


R= collisionRangetintruderSpeedxnxtstep. 


Indeed, any intruder aircraft out of the circle of center (x(eun) ,yCevn)) and 
radius R, needs a larger time than nxtstep to reach any point of the circle of 
center (x(evn) ,y(evn)) and radio collisionRange. The property above can 
be expressed in PVS as follows. 


collision_invariant : LEMMA 
V (intruder,evader:State, tr: [posnat — Bank], n:nat) 

collision_scenario(intruder,evader,tr,n) 

=> 

V Gi: (0...n)): 
distance (intruder_trajectory(intruder,tr,i), 

evader_trajectory(evader,n)) < 

collisionRange+intruderSpeed x (n-i) xtstep 
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The proof of the invariant above requires the following lemma. 


distance_invariant : LEMMA 
V (intruder,evader:State, tr:[posnat — Bank], n:nat) 
distance(intruder_trajectory(intruder,tr,n),evader) < 
distance(intruder_trajectory(intruder,tr,n+1),evader) + 
intruderSpeedxtstep 


Lemma distance_invariant states that with respect to a fix evader position, 
one step in a straight trajectory leads farther than one step in any other direction. 

We intend to use the above invariant and lemmas, together with properties 
derived from the physical trajectories, to find a bound greater than 4 seconds 
for any collision scenario. Under the assumption that the intruder bank angle is 
zero, we have proven that an alarm is issued 19 seconds before a collision. That 
property is experessed in PVS as follows 


alarm_before_19_seconds_to_collision : THEOREM 
bank(intruder) = 0 A 
collision_scenario(intruder,evader,straight_trajectory,m+38) 
> 
(V (i:subrange(m,m+38)): 
larcalert (intruder_trajectory (intruder ,straight_trajectory,i), 
evader_trajectory(evader,i))) 


We are trying to generalize the proof for an arbitrary trajectory and a time of 9 
seconds. 


6 Conclusion 


Several case studies have been performed on the application of hybrid automata 
to the modeling of systems which include continuous and discrete domains. In 
particular, a simplified TCAS system was modeled in [9] using hybrid automata. 
That work focuses on establishing a hybrid model of the closed loop system for- 
med by several aircrafts flying under TCAS assumptions. Although it is claimed 
that the model is suitable for formal] analysis, there is no explicit attempt to au- 
tomate the proof process. On the other hand, state exploration techniques have 
been used to analyze the system requirements specification of TCAS II written 
in RSML [7]; we refer for instance to [5,2]. These works focus on the reactive 
aspect of the whole system. 

In the work presented in this paper, we constructed a formal model of the 
kernel of an alerting algorithm and we studied its behavior with respect to a 
model of collision trajectories. We defer the integration of the alerting algorithm 
with rest of the system, for example TCAS, for future research. 

An abstract model of the algorithm and its properties were developed in the 
general verification system PVS. We complemented the prover capabilities with 
computer algebra tools. Indeed, differential equations, resulting from physical 
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phenomena, were mechanically verified in MuPAD. Models of the algorithm and 
collision trajectories were implemented in Java. The implementation allowed us 
to graphically explore collision scenarios before performing rigorous attempts to 
prove properties. 

Although we have confidence in the conjectures that have been declared as 
axioms, work is being performed [10] in the development of a PVS library on 
transcendental functions which complements a previous work on mathematical 
analysis in PVS [1]. Hence, it might be possible in the near future to replace the 
axiomatic definitions with theorems. 

Lower and upper bounds for a time when an alarm will be issued before a col- 
lision were found. Our immediate goal, in the verification of the AILS algorithm, 
is to prove certain facts about the characteristics of the aircraft trajectories. We 
hope that these facts allow us to prove the adequacy of the alerting algorithm 
for a time large enough to avoid any possible collision incident. 
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Intel’s Formal Verification Experience on the 
Willamette Development 


Bob Colwell and Bob Brennan 
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In some ways, microprocessor design quality has improved tremendously since 
the ’70’s and ’80’s. It was not uncommon in those days to have to respin silicon 
five or ten times before the device exhibited even basic functionality. Micropro- 
cessors were designed by circuit designers directly into schematics, with little 
pre-silicon functional testing. In the late ’80’s, register transfer level (RTL) de- 
scriptions of CPUs, such as the Intel® 486, processor were written, allowing 
useful pre-silicon validation to be performed. This validation was primarily black- 
box assembly tests written by humans, and it worked because the CPUs were 
simple designs that could be controlled and observed directly from such tests. 
Later CPUs required much more intensive pre-silicon efforts, including random 
code testing and massive amounts of simulation cycles, to get around human 
limitations on test writing productivity and insufficient imagination in knowing 
where to look for bugs. Modern designs (’90’s until present) often run operating 
systems successfully on first silicon, despite having microarchitectures that are 
orders of magnitude more complicated than their predecessors. 

That was the good news. The not-so-good news is that a confluence of nega- 
tive trends are now threatening microprocessor designers. Because performance 
and clock rate have become the metric of choice for buyers, CPU designers are 
concentrating on delivering them, and designs are becoming extremely complex 
as a result. Intel’s last three new IA32 microarchitectures were the Pentium® 
Processor, the Pentium Pro Processor (P6), and Willamette. As measured by 
lines of RTL code, these processors have increased in complexity by a factor of 
2.5x per generation: the Pentium Processor weighed in at 100K lines, the Pen- 
tium Pro Processor at 250K lines, and the new Willamette processor at 800K. 
Two other plausible indicators of design complexity are the total project design 
effort and the number of design errata found in testing. Both of those metrics 
have also increased at the rate of 2.5x per generation. 

Total volume shipments have risen dramatically, from a few million parts in 
the ’80’s to hundreds of millions today, and Intel’s experience with the Pentium 
Processor Floating Point Divider (FDIV) flaw is a stark reminder of how expen- 
sive a single design errata can be — $475M for Intel to replace only 5M parts. 
Moreover, today’s much higher volumes are shipped much earlier in the product 
cycle; where only a few hundred thousand parts might have been sold in the 
first year of a new microprocessor in the ’80’s, today we ship millions within 
that crucial time period, where overlooked errata are most likely to be detected. 
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This raises the stakes for “getting it right in the first place”, and yet that must 
somehow be accomplished on a shorter development schedule. 

These trends were already becoming clear when we began planning for the 
Willamette processor development effort, a new Intel IA32 CPU expected to 
become a product in 2000. Two fundamental goals of the processor were higher 
performance and higher clock rate (those are not always the same), and we did 
not know how to achieve them without a more complicated microarchitecture. 
To compensate for this additional complexity we decided to develop and deploy 
new formal verification techniques that were just becoming feasible. We had 
little corporate experience with this new technology, so we did not give up on 
the usual dynamic simulation methods, but we did allocate approximately 20 
project heads to FV out of the total pre-silicon validation allocation of 100 
people. 

We purposely did not charter the new F'V team with errata detection, alt- 
hough we did hope they would find some (if for no other reason than to confirm 
the new technology was actually doing something useful!). Instead, guided by 
our scars with FDIV, we adopted the attitude that dynamic simulations would 
find the vast majority of pre-silicon errors, and when that set of techniques hit 
its marginal utility asymptote, FV might locate any remaining FDIV-like errata 
that might still be hiding in the rest of the design. 

Willamette’s FV team emphasis was primarily model-checking, with limited 
theorem-proving. One reason for this was that our structural RTL code was at a 
generally rather low-level of abstraction, and there were no usable specifications 
written by the architects or designers up-front. We did do theorem proving on 
IEEE floating point units due to the existence of a specification, and since the 
creation of the necessary specification could then be leveraged by many other 
chip developments within the company. 

Our F'V results have been very encouraging. On Willamette units that were 
formally verified, no silicon errata have been seen to date. Despite the learning 
curves and team-building overhead, the 20 F'V engineers on Willamette managed 
to formally verify over 15% of the overall design. Based on this experience, we 
will do more extensive FV on future CPU developments. We do not expect to 
phase out dynamic simulation, however; we found over 8000 pre-silicon design 
errors with dynamic testing, and it remains a time- and labor-effective way to 
drive such errors out of hiding. Finding the best balance between these validation 
methods is the key. And even with both methods fully employed and balanced, 
we believe that future designs will have to begin taking product complexity into 
account at the architecture phase of the design if future products are to achieve 
the high quality necessary to keep the customers happy and the manufacturer 
out of trouble. 
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Abstract. We describe a low-level proof format, which can be used for 
independent proof checking and as an intermediate language for trans- 
lating proofs between systems. The checker is presented as a virtual ma- 
chine and the proof format as the bytecode. We compare HOL and Coq 
with a view to designing this pivot language, and describe a prototype 
which converts recorded HOL proofs into this intermediate format, and 
then translates them into Coq. 


1 Communication between Proof Assistants 


There are several motives for wanting to enable communication between proof 
assistants. The most important is that users of one proof assistant might want to 
use proofs written with another. Another is using one system to check another. 
There is also an ecumenical interest in forging links between theorem proving 
communities. 

There has been some work in providing a general framework for different 
logics. The MathWeb [AHJ*00] and Open Mechanized Reasoning Systems pro- 
jects [GPT94] aim to provide such a framework. However, it seems there is still 
much to do at the level of getting individual proof assistants, with their different 
logics, to understand each other. In this article we will examine the specific case 
of translating HOL [GM93] proofs to a Coq [BBCt 97] readable format.! 

The translation between these two logics involves a number of non-trivial 
logical issues. Our approach is to represent proofs using a low-level intermediate 
proof format, which we use as an intermediate language. We have implemented 
a translator based on these ideas which accepts a wide variety of HOL proofs. 
We believe that the approach we have taken has potential for more general 
application. 

In order to communicate proofs, there must be a proof representation, some 
form of proof object [Bar96,BD93]. The starting point for this work is [Won99], 
which describes an extension of HOL with the ability to record proofs in a 
particular internal format (as a sequence of inference steps). A different approach 
is taken by the logical framework, LF [HHP87], which uses the dependently 
typed lambda calculus to represent proofs. An improved representation which 
avoids some redundancy has been given by Necula and Lee [NL98]. There have 


1 We use HOL 98 (Athabasca 5) and Coq V6.1. 
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also been some specific translations between systems. Boyer and Dowek [BD93] 
implemented a proof checker for the Calculus of Constructions in Nqthm. The 
closest work to ours is that of Felty and Howe [FH97], who gave a translation 
of HOL terms into Nuprl. This work is complementary to ours, in that they 
concentrate on translating terms and ignore proofs, whereas we translate proofs 
but tackle a limited collection of terms. 

In Section 2, we look at the differences between HOL and Coq, in both the 
logics and the pragmatics of proof development. Then in Section 3, we describe 
the proof format as the bytecode which runs on an ‘idealised’ virtual machine. 
Although the checker has not been implemented, these ideas form the basis of 
a prototype translator which has been implemented. In Sections 4 and 5, we 
describe various features of the translation that have been implemented to date, 
and give an overview of the algorithm used to translate to and from this format. 
Finally, in Section 6, we discuss various ways in which our prototype system can 
be improved. 


2 Comparison of HOL and Coq 


Although HOL and Coq are both implementations of a form of higher-order 
logic over a version of the lambda-calculus, there are significant differences in 
the details of the logic and in the pragmatics of proof development, so that it 
can be surprisingly difficult for users of one system to understand the mind-set 
of those of the other. 

Zammit [Zam97] compares Coq and HOL, and we add some more details 
from the perspective of translation. We first compare the practicalities of proof 
construction, and then the details of the logics. 


2.1 Proof Development 


Proof Metalanguage HOL is an “LCF-style” theorem prover, meaning that 
the user uses the ML programming language directly within the proving 
session, in order to construct Hilbert-style proofs. Proof development is an 
interactive activity consisting of writing ML functions and invoking tactics, 
which is just evaluating ML expressions whose side-effects may alter the 
proof state. 

This use of an external metalanguage is really quite alien to the Coq pa- 
radigm; an interactive proof session, and the saved proof script is entirely 
within the Coq proof language. 

Proof Modes There are two modes in Coq: top-level and proof mode. In proof 
mode, we set a goal and interactively prove a theorem. At the top-level, when 
not actually proving a theorem, we can check the type of a term, and so on. 
In Coq terminology, all the commands which alter the proof state during a 
proof, in proof mode, are called tactics. 

Although HOL also has two such modes, proofs are usually constructed at the 
top-level, which is just an ML session. There is an interactive goal-oriented 
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facility but this is not primitive, and proofs constructed here are compiled 
down into combinations of forwards-style inference steps. 

Proof Style HOL is fundamentally a forwards-style (bottom-up) theorem pro- 
ver whereas Coq is backwards-style (top-down). Although, in practice, HOL 
tactics let proofs be carried out in a backwards style (top-down), this is just 
sugar for the forward inference steps. In fact, Coq also allows proof terms to 
be directly constructed in a forwards-style, but the interactive development 
language is backwards-style. 

This directionality influences the treatment of contexts in the two systems. 
For example, the HOL rule of modus ponens is: 


Iy-} PDQ InkP 
Mure 


The corresponding step? in Coq is the cut rule: 


FrPsQ ItP 
TrQ 


The point is that in HOL, we start with two judgements with arbitrary 
contexts, and so must combine these contexts in the conclusion. In Coq, 
since we work backwards, the single context is just duplicated. 

Tactics, Inference Rules, and Theorems In HOL terminology, the result of 

each inference step is a “theorem”. When working forwards, at each step 
there is well-defined statement that’s been proven at that point. These theo- 
rems are represented internally as the elements of a datatype thm, where a 
theorem is a list of assumptions and a conclusion. 
When developing a proof interactively in HOL, since the user must use ML 
to construct the proof, a tactic can not be invoked directly. Rather, the 
user must type e(the_TACTIC), where e will evaluate the_TACTIC, and as a 
side-effect, change the proof state. Otherwise, a ‘proof’ is given directly, by 
defining a theorem via the inferences that prove it. For example, 


val TWOREFLS = CONJ (REFL (Term ‘i:num‘)) (REFL (Term ‘2:num‘)); 


binds the ML term TWOREFLS to a proof of 1 = 1A 2 = 2. The functions 
REFL and CONJ return terms of type thm but are not datatype constructors; 
the proof cannot be reconstructed from the value of TWOREFLS. 

The top-down notion of proof as a sequence of formulae, each of which is 
an axiom or follows from some previous members of the sequence using an 
inference rule, is not appropriate in a backwards style, since it does not really 
make sense to speak of having proved a theorem at each stage. Starting with 
a goal, and gradually destructing it until we reach True, say, we cannot 
directly? say that we have proven anything before the proof is finished. 


? Although the HOL tactic IMP-RES-TAC corresponds more literally, in the sense 
that it works backwards and so duplicates the context, we make this correspondence 
since both are primitive at the level of proof abstraction we will work at. 

3 Of course, we have indirectly proven a conditional theorem. 
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In Coq, proofs are constructed interactively, or by loading the corresponding 
script from a file. 


Theorem TwoRefls : (1=1) /\ (2=2). 
Split. 

Reflexivity. 

Reflexivity. 

Qed. 


This results in the identifier TwoRefls being bound to a proof object — a 
lambda term — but this definition could be done directly, as in HOL. Proofs 
are structured into various units (theorems, lemmas, definitions, etc.) each 
of which can be constructed directly or interactively. The intermediate states 
in the proofs of these units are not stored. 

HOL tactics can do more than necessary, in the sense that many of the 
expanded primitive inferences are not used to construct the main theorem. 
User-defined tactics are used much more extensively in HOL than in Coq’. 
The user can define tactics, however, in Coq also, using basic tactics and 
tacticals. The Coq tactic language does not have powerful features such 
as recursion and pattern matching. If necessary, tactics can be written in 
Objective Caml and then linked to Coq’s code. This does not seem to happen 
often, however! 

Primitive Rules In HOL there is a well-defined set of primitive inference rules 

and axioms. Using these primitive rules and axioms, about 40 so-called basic 
rules are derived: introduction and elimination rules for the logical operators, 
congruence rules for equality, and so on. It is applications of these rules that 
are recorded in Wong’s HOL proof format [Won99], and treated as though 
primitive. Proofs using HOL tactics compile down into basic rules. 
This Hilbert-style approach is not really in the spirit of the Coq type- 
theoretic style of proof development. Although Coq is implemented in terms 
of about seven primitive tactics (Intro, Clear, Change, etc.), these have no 
special status in the language from the user’s point of view. A proof is not 
thought of as being a sugared sequence of ‘primitive’ steps. The user just 
does whatever it takes to find an inhabitant t of a type 7 in the calculus 
of constructions, at which point the system checks that t does indeed have 
type 7. The term t is retained as a proof object. 

Contexts During Coq proofs, there is an explicit context, visible at all times. 
The context contains both variable declarations and propositional assump- 
tions. In fact, propositions are just a special case of variables because to 
assume P is to assume a proof H : P. We can add an assumption using the 
command, Variable n:nat, for example. In HOL, however, the ‘context’ is 
a list of assumptions which can be open. Free term and type variables are 
not explicit. 


* As pointed out in [Zam97], though, this is often more a measure of the low-level at 
which HOL proofs would otherwise be carried out. 
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2.2 Proof Language 


Propositions In HOL, there is no distinction between booleans and propo- 

sitions. Propositions are manipulated just like any other terms. A direct 
consequence of this is that extensionality of booleans corresponds to the law 
of the excluded middle. 
In Coq, on the other hand, booleans bool are a type (with canonical closed 
terms true and false) in the (computational) universe Set, whereas pro- 
positions are more like types, but in the (non-computational) universe Prop. 
Thus terms can be ‘typed’ by propositions: if t : P, where P is a proposition, 
then ¢ is a proof of P. 

Logic HOL uses classical logic whereas Coq is intuitionistic. This is not a signi- 
ficant difference since proofs in Coq can be given as terms in either the Set 
or Prop universes, and the latter can consistently be assumed as classical, 
by adding a classical axiom. 

Type Theory Coq uses a considerably more powerful type theory than HOL, 
the (inductive) calculus of constructions. The formulations of theorems and 
constructed objects are typically more complicated than in HOL. For exam- 
ple, HOL does not have dependent types, or full polymorphism. Expressing 
polymorphism with type variables rather than type quantifiers corresponds 
to Hindley-Milner polymorphism, rather than Girard-Reynolds. Inductive 
types are used extensively in Cog. Subset types (e.g. {n : nat | n > 0}) are 
a particular case of these. Moreover, Coq is Church-style (bindings must be 
explicitly typed), whereas HOL is Curry-style. 


In Section 4, we will see how the translation copes with these differences. 
First, though, we describe an intermediate format. 


3 Abstract Machine for Proof Checking 


By proof checker we mean a tool which takes a proof and a proposition, and 
decides whether or not the proof is valid for this proposition. This is unlike a 
proof assistant which is a tool used to construct such a proof, by a combination 
of automatic search and user interaction. Many proof assistants (such as Coq) 
contain a checker at their core. Some authors consider a checker to take a proof 
script and produce a proof object, but we will identify proof objects and scripts 
here. 

In this section, we discuss the possibilities for a low-level intermediate proof 
format, and then describe a proof checker which uses this format. 


3.1 Proof Format 


Proofs in HOL and Coq can be presented at several levels: in HOL, using tactics 
(which may invoke opaque decision procedures), as a sequence of basic inferences, 
as primitive inferences, or as the ML code which runs through the proof; in Coq, 
as (Coq style) tactics, as a sequence of internal manipulations to the proof state, 
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or as a lambda-term. There is some choice, therefore, for the appropriate level 
at which to communicate proofs. 

The LF approach is to encode proofs as terms in the lambda-calculus where 
the inference steps are primitive constants and use this as a universal proof 
format. The proof object produced by Coq is a lambda-term. Checking this 
proof amounts to checking it is well-typed. The type can be reconstructed and 
is not given explicitly in the proof object. It is not clear how HOL could directly 
produce such an object. 

An alternative, however, is just to use the inference steps directly. This has 
the additional advantage of avoiding the mass of typing annotations that typi- 
cally appear in LF representations. Also, to a certain extent, basic tactics are 
independent of the specific details of a logic, such as whether the terms are typed 
or not. 

The next choice is whether the proofs should be in a backwards or forwards 
style. For reasons we will outline below, we adopt a backwards style, and show 
how to transform forwards-style HOL proofs. We start with a goal, which is 
then manipulated by the inferences. The proof is finished when we reach Qed 
and there are no more goals to prove. 

To check a forwards-style proof, that is, a sequence of theorems each inferred 
from its predecessors, we would have to constantly check whether we had rea- 
ched the desired goal.5 At each stage, we would need to check that only proven 
theorems or axioms are used, or have two passes, where the first pass lists all the 
theorems, and the second checks their use. Either way, this is cumbersome. Also, 
it would be difficult to translate Coq’s native low-level into a forwards style. 

A backwards-style format offers two main advantages: 


— Intermediate theorems do not need to be stored.® This reduces the size of 
proofs considerably, as well as the time needed to check a proof. It also redu- 
ces the complexity of the checker since, for each step, we just need to alter 
the proof state in a deterministic manner based on the current instruction, 
rather than check the validity of hypotheses, or compare theorems. 

— There is less need for explicit arguments to rules, reducing the size of a 
proof further. For example, in forwards-style, the assumption rule is given as 
argument a proposition, P, and returns the theorem P+ P. Read backwards, 
the P is implicit; to prove the goal P + P, we need just appeal to the 
assumption rule. 


3.2. Abstract Machine 


There is an analogy between running a program and checking a proof. We pre- 
sent a proof checker as a virtual machine, for which the inference steps are the 
bytecode. 


5 Or, with exactly one goal, have a convention that the given proof steps are all 
necessary, and compare the result when finished. 

® Although it is possible to just record the application of inference rules used to con- 
struct a sequence of inferred theorems, checking this would have to be done back- 
wards. 
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The idea is that the machine has a state that represents the proof checking as 
it proceeds, containing a stack of goals, the current context, the current propo- 
sition to be proven, and the list of steps to be followed. This is an alternative to 
requiring the proof format to carry such information. The machine will implicitly 
know what is required. 

Following the virtual machine analogy, we will think of a proof file as consi- 
sting of a number of components. In the simplified prototype translator we will 
describe in the next two sections, we will just have axiom and theorem com- 
ponents. However, we also envisage signature and inference rule components, 
though these are not implemented in our prototype and are not formalised here. 

The machine takes a collection of proofs, stored in the theorem component 
— an indexed collection of proofs. One of these will be the main theorem. The 
use of some other theorem corresponds to looking up the proof in the theorem 
component (like invoking a method), so there are no forward references. We place 
an ordering requirement to avoid circularity, by assigning each theorem a unique 
number. All proofs used are checked exactly once and there is a global state 
recording the checked status of each theorem. There is also an axiom component, 
listing the hypotheses on which the proofs are based. 

We allow references to both internal and external theorems. External theo- 
rems must be taken from some named source and are considered to have been 
checked. The checker does not have access to their proofs. In the case of some 
theory library we could imagine that the proofs are publicly available. A theo- 
rem might also be proved by a BDD or computer algebra package, or a model 
checker, say, in which case there is no proof, but the appeal to this external 
source should still be listed in an external theory component, which lists the 
external theories which are used and assumed to be checked, or trusted. 

A proof state consists of the current index, a list of goals to be proven, the 
current goal, that is, the current context and proposition, and a list of steps to 
be followed. 


goal = context x proposition 
proof state = index x list goal x context x proposition x list step 
context = list assumption 
assumption = proposition | typed variable 


We will allow the usual logical operators of predicate logic with equality, 
without specifying the term syntax beyond being a Church style lambda calcu- 
lus with some notion of occurrence and substitution. A suitable formalism for 
doing this would be the XML-based OpenMath [CCC99] standard. We distin- 
guish types, terms and propositions, however. The logic has the usual primitive 
logical operators, explicit contexts, and axioms for classical reasoning. We use 
the metavariables t for terms, occ for occurrences, x for variables, P for propo- 
sitions, and thy for theory names 

The machine is required to be able to check a-equivalence, inspect and re- 
place subterms at given occurrences, and check that a formula has a particu- 


lar form (e.g. an equality or a conjunction). We use the notations =*, and 
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subterm(occ, P), P[occ := t], and Ploccs := t] for alpha equivalence, subterm at 
a given occurrence, and single and multiple substitution of terms at occurrences. 
We also use the notation P[t/zx] for the substitution of a term for a variable. 


We will give transition rules for a minimal selection (we omit the rules for 
conjunction, disjunction and external theorems) of instructions: 


step ::= Ass | Thmit* | Axiomit* | Conj | LProj P | RProjP | Lor | 
Ror | OrElimPP’ | CutP | Intro | Refl | Sym | Transt | 
Beta | Eta | Zeta | Absurd | Defzt | Unfoldocc | Foldocct | 
Ext thyit* | Genocct | Substocct | ExIntrot | ExElimzt | 


App-_cong 


(use assumption in current context, use theorem 7 with arguments, use axiom 
i, conjunction introduction, conjunction eliminations, or introductions and eli- 
mination, cut, implication or quantification introductions, reflexivity, symmetry, 
transitivity, beta, eta and zeta (lambda-congruence) equality rules, reasoning 
from false, definition, unfold and fold a particular occurrence with respect to a 
definition, use external theorem 7 in theory thy, generalise at occurrences occs, 
substitution, existential elimination, application congruence). 

The proof style used is closer to sequent calculus than Hilbert style. The code 
consists of low-level instructions for the various logical constructs, rewriting, and 
definitions. It differs from sequent calculus in its operational formulation, and in 
having explicit commands for definitions and the use of axioms and theorems. 

There are instructions for the introduction and elimination of the various 
logical operators. Following Coq, the Intro instruction is introduction for both 
implication and quantification. The elimination rules for implication and quan- 
tification are Cut and Gen, respectively. 


We distinguish two forms of substitution here: 


teEP Plt/x] tat 
P{t/z] P{t'/z] 


The first rule read backwards becomes generalisation, Gen occt. We indicate the 
term to generalise over by indicating the occurrence rather than giving the term 
to match against. We allow several arguments to Gen in order to generalise over 
several terms. However, we still need a substitution rule for the second rule. 


We write simultaneous substitution of t,,...,t, for 21,...,%, in P as 
P{t,/xx], where the range of the index k = 1,...,n is implicit. We write tu- 
ples using () brackets. Context extension is indicated as (Ix: 7) and (I, P). 

A selection of the transition rules is given in Figures 1 to 3. We write Thmi 
for Thmi[], and similarly for Axiom. We will write Qed as a synonym for True 
in the rules. The checking succeeds if the machine reaches Qed with no further 
goals to check. Otherwise, if we reach a state where no rule applies the checking 
fails. 
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When an axiom or theorem is invoked the machine does a lookup to see if it 
matches the current goal. 


lookup_thm : index — proposition x list step 


lookup_axiom : index — proposition 


If a theorem has been invoked, it then looks to see whether it has been checked 
or not. If it is unchecked, and its index is greater than the current index, then its 
proof is checked in the empty context. If this succeeds it is marked as checked. 
If it has already been checked, or is an axiom, the goal is proven immediately 
and replaced with any hypotheses. In either case, the goal is popped off the 
stack. We give separate rules for Axiom and Thm depending on whether or not 
any arguments are given. 

The instruction Def zt defines x to be t by adding the equality z = t to the 
local context. The instructions Fold and Unfold make use of definitions in the 
local context. 


4 Logical Aspects of the Translation 


As mentioned above, in this simplified prototype, we will just have axiom and 
theorem components. We use the proof bytecode as a pivot language to translate 
between Coq and HOL. Translation from HOL to Coq is a two stage process. 


1. HOL — Pcode: Translation to a syntactically neutral bytecode, representing 
top-down proofs. 
2. Pcode —+ Coq: Conversion of bytecode to Coq commands. 


It would, of course, have been possible to give a direct translation, but the 
two stages are a natural and modular split. The first stage involves manipulation 
of HOL terms and the extraction of relevant information, whereas the second in- 
volves the construction of Coq terms. This modularity would be more important 
if we gave translations to and from a third logic. 

The input is a forwards-style HOL proof of the form: 


HOL_proof = goal x inference_step list 
inference step = name X argument list x conclusion 
argument = proposition_in_contert | term | type 
goal, conclusion = proposition_in_contezt 
proposition_in_contezt = proposition list x proposition 


An example of the concrete syntax of an inference step in HOL (in the format 
of [Won99] is: 


{Just = [REC_THM|- 1 = 1, REC_THM|- 2 = 2], Tag = "CJ", 
Tom. = JCS 1) Cast 
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Axioms and Theorems 


lookup_thm (j) = (P, S’) 
G, LIL P,S") — G, [2% ded, 8”) ee (3), 
"GGPPTni: 5S) 3GGrps )\'<s* 

(i, GI, P, Tami :: S) —+ (i, G, I, P, S) checked(j) := true 


lookup_thm (3) = (Vr1 DTL ky: TP; Recor eee P, +> Q, i) 
(UU, Vea [Tp 66+2n iT Py aoe Pm + Q, 8") os 
(i, [], 2, Qed, S$”) {rcp (3), 


; - i<J, 
(i, G, I, Q[te/xx], Thamj ti --+ty :: S) — éphed ay ii 
(i, (L, Palte/ael) 2-2 (DP, Pm[th/ae]) 2: G, I, Palte/ze], 8) enced Des 
lookup-thm (j) = (P, 5’) foe (j) 
(i, G, I, P, Thmj :: S) —> ,G,P,Qed,S) U<j 


lookup_thm (j) = (Vz1:71°++2n:Tn-Pi 3 ++: Pa > Q,S") yo (3) 
(i, G, TP, Q[te/zx], Thamj ti ---th :: S) — i<j 
(i, (Ir, Po[tx/xx]) oars (F, Pn [te /ze]) : G, rT, P,[tk/zx], S) 


lookup_axiom(j) = P 
(i, G, I, P, Axiom] :: S) —> Gi, G, IP, Qed, S) 


lookup_axiom(j) = Vr1 :71-+-2n:™-P1 3 ++: Pm 7 Q 
(i, G, I, Qltx/xx), Axiom j ti ---tn 1: S) — 
(i, (LP, Palte/ae]) ee (Dy Pmite/rel) : GL, Pilte/rx], S) 


Definitions 


i, G, I, P,Def xt :: S) —> G,G,(0,x2:7,2 =t), P,S) 
subterm(occ, P) = x 


(pg ete 
(i, G, I, P, Unfold occ :: S) —> (i, G, I’, P[occ := t], S) @ ef) 


subterm(occ, P) = t 


(i, G, I, P,Fold occc :: S) — (i,G, I, Plocc := cj, S) met) 


Substitution 


subterm(occ, P) = t' 


(i, G, I’, P, Subst occt :: S) — (i, (Pt =t’) :: G,T, Ploce := t], S) 


Fig. 1. Transition Rules 
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Equality Reasoning 


(i, G, It =* t',Refl =: S) —> (i, G, I’, Qed, S) 


(i,G,I,t = t', Sym :: S) —+ (i,G,I,t' = t, S) 


(i,G, Iti = te, Transtg :: S) —> (i, (I, t3 = te) : GI, ti = ts, S) 


Fig. 2. Transition Rules cont. 


This will be translated simply to the instruction Conj. 

In Section 2, we described differences between HOL and Coq that are sig- 
nificant for the representation of proofs. We now describe how these logical 
differences are handled by the translation. 


Direction of Inference Translation between forwards and backwards styles is 

not a simple reversal since we must take account of dependencies introduced 
by subproofs (and definitions). 
To do this, the sequence of HOL steps is converted into a DAG, where 
hypotheses are child nodes. Although the proof corresponds to a unique 
tree, during a HOL session the branches can be constructed in any order, 
and so can appear in any order in the list of proof steps. Hence we must 
match theorems to hypotheses. The nodes are the names of inference steps 
plus any arguments necessary for backwards proof. The underlying tree is 
traversed and output in infix order as a sequence of instructions. 

Contexts The proof checking state uses an explicit context of variables and 
propositional assumptions. HOL inference steps which combine contexts in a 
specifically forwards style, such as CONJ and MODUS PONENS, can consistently 
be read backwards. Additional assumptions are propagated backwards au- 
tomatically. The step ADD_ASSUM which adds an assumption to the context 
of a theorem can be ignored. For example, the HOL proof 


___ __BFB 
AFA B,CFKB 
A,B,CFAAB 


will become 


A,B,CF A A,B,CtB 
A,B,CFAAB 


Prop and Bool It is fundamental to the classical nature of HOL that booleans 
are used as propositions. However, it would be inconsistent in Coq to identify 
equality of propositions and logical equivalence so equality of booleans is 
translated into an iff. In fact, since equality in HOL is extensional, unlike 
in Coq, we must also perform eta-expansion. Therefore, we translate all 
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Quantifications 


(x € I) 


(i, G, IP, P,Gen occ1,..., 0CCn 1: S) —> 
(i,G, 0, Ve :7.Ploccr,...,0CCn := a], S) 


G,G, P,Vz:7.P,Intro:: S) —> (i,G,(I,x2:7), P,S) 


Other Logical Rules 


a I geen ee ee ee oS 
iG, FP hss: 5) > GG, Figea,s) PEP) 


(i, (I, P) = G, I", Qed, S) —+ (4,G, FP, P, S) 
(i, G, I, P, Absurd :: S) —> (i,G, IP, 1, S) 


(i, G, P,Q D P, Intro :: S) —> (i, G, (LT, Q), P, S) 


(i, G, IP, P, Cut Q :: 8) — (i, (LQ) : G,P,Q > P,S) 


Fig. 3. Transition Rules cont. 


equalities into an eta-expanded form, replacing equality on booleans with 


equivalence. 
For example, if f and f’ have type num — bool, then f = f’ will be translated 
to 


(x-1385, 21386 : nat)x_1385 = 71386 - (fx_1385) © (f’r_1386) 
(the variables being generated automatically) and the HOL step 


F=F' a=da' 

Fa= F'a! 
will be translated to Cut a=a’; Generalize a a’. These Coq commands 
take the goal first from Fa = F’a’ to the subgoals a = a’ — Fa= F’a' and 
a =a’, and then to Vz.V2'.2 =2' > Fx = F’z' anda=a’. 
The HOL step ‘equality modus ponens’ 

P=Q P 

Q 


translates to the transition 


G,G, P,Q, 1ff_cut P :: S) — (,(7,P): G,P,P <— Q,S). 
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In Coq, we represent this as the lemma 
iff_imp1 : (P,Q: Prop)(P 4 Q) > (P — Q). 


The rules EQ_IMP_RULE_L, EQ_IMP_RULE_R, IMP_ANTISYM_RULE, and the asso- 
ciated axiom, IMP_ANTISYM_AX are translated similarly. 

The other inferences to do with equality — reflexivity, symmetry, transitivity, 
congruence — also have propositional versions. This leads to complications, 
in particular, with the congruence rules (for example, the HOL rule MK_COMB 
and AP_TERM). 

Theorems In HOL, a theorem is a list of hypotheses, paired with a conclusion. 
This is translated to a single proposition consisting of a quantification over 
free type and term variables and a nested implication from hypotheses to 
conclusions. Since HOL proofs work implicitly on the conclusion, each Coq 
proof must first Intro each of the assumptions. 

We first explain how lemmas are used during a Coq proof. If the current goal 
is P, and lemma T directly proves P, then we can solve the goal immedia- 
tely with the command Exact T. If T proves A — P, then we can use the 
command Apply T, which will replace the goal with A. However, if the goal 
is B > C and we want to use the lemma A — B — C then Coq is unable to do 
the match. We have to first do an Intro, then apply the lemma, matching 
against C. 

With either command, we can also ‘pass’ arguments to theorems. For exam- 
ple, if the goal is P[t/x] and T proves Vz : 7.P, then we can use Exact (Tt). 

Church-style type annotations Since Coq requires explicit quantification, 
we must add quantifiers over term and type variables wherever necessary. 
This means though that where a HOL proof would expect to be working on 
an unquantified proposition, the corresponding Coq proof must insert the 
appropriate number of Intro’s. In practice, this means that lemma proofs 
must begin by introducing all the ‘extra’ quantifications, that is, the free 
variables, and the hypotheses. 


5 Translation Algorithm 


We have implemented a prototype translator in about 750 lines of ML, which 
accepts a subset’ of HOL proofs and outputs a Coq readable file. The basic 
idea of the algorithm is to first represent the proof as a DAG, where each node 
corresponds to a theorem, in the HOL sense, and children are hypotheses. Then 
any shared hypotheses can be factored out as lemmas, and leaf nodes will become 
axioms. This gives the architectural organisation of the proof. As for translating 


? Apart from a few basic inference steps [GM93] that we have just not got round 
to implementing (type instantiation, negation rules, and a few others), the main 
omissions are the steps for axioms and definitions. These are used by the definitional 
packages for making recursive function and type definitions, and treatment of this 
is the main topic of [FH97]. 
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individual steps, there is some choice for how much detail can be put in the 
bytecode, and how much can be taken for granted. At one extreme, we could 
write a tactic in Coq, HolTac say, that mimicked HOL, and simply translate 
proofs to a single application of HolTac. Instead, we translate the proof, step by 
step, as it is. 

We do not use the instruction set of Section 3.2 directly, but modify it for 
ease of implementation. There are two main differences from the virtual ma- 
chine described above. The first is that we use HOL terms. This simplifies the 
implementation considerably since we can rely on the HOL system to mani- 
pulate terms. The second comes from the fact that certain inference steps in 
HOL correspond to several steps in the pivot language, and in Coq. However, 
we have not implemented the inference rule component. Thus, instead of writing 
the lemmas in the pivot, we translate them directly into Coq lemmas and (more 
generally) tactics. We extend the instruction set with a number of ad hoc in- 
structions, such as Iff_sym (symmetry of iff) and AE_-EX (application congruence 
for 4). The generated Coq proofs must first load a file with Coq code (lemmas 
and tactics) for these instructions. For example, the instruction Iff_sym will be 
translated to an application of the lemma iff_sym, where 


iff_sym : (P,Q: Prop)(P 4 Q) > (QP). 


There are extra arguments to Introi and Zeta, giving the variables which 
these steps introduce to the local context. If the translator also implemented a 
checker these variables would be implicit. Similarly, the application congruence 
instructions, App_cong1 and App_cong2 (one for when the function terms are the 
same, and one for when they differ) also need extra arguments, for subterms that 
would be implicit. We do not need to pass arguments to axioms and theorems. 

The datatype of bytecode instructions is: 


datatype instr = Ass | Thm of int | Axiom of int | Conj | 

LProj of term | RProj of term | Lor | Ror | 

OrElim of (term*term) | Cut of term | Intro | 

Introl of term | Refl | Sym | Trans of term | Beta | Eta | 

Zeta of term | Absurd | Def of (term*term) | 

Unfold of occurrence | Fold of (occurrence*term) | 

Ext of (string*int) | Gen of occurrence | 

Subst of (occurrence*term) | ExElim of term*term|ExIntro of term | 
Iff_cut of term | App_cong1 of (term*term) | App_cong2 of term*tern | 
Nop | Genterm of (term*term) | Genterm_I of (term*term*term) | 
Error of string | Iff_refl | Iff_sym | Iff_trans of term | 

AE_FA of (term*term*term) | AE_EX of (term*#term*term) | 

MC_FA of (term+*term*term) | MC_EX of (term*term*term) ; 


The translation consists of four passes, of which we combine the first two. 
The input to the algorithm is the list of proof steps output by the HOL proof 
recorder. The first phase uses HOL functions, for example, for finding the types 
of terms. Since the bytecode uses HOL terms, the second phase must also use 
HOL functions but, in principle, it should be independent of HOL. 


122 E. Denney 


Conversion of individual steps to bytecode The function trans_step 
translates a HOL inference step, consisting of hypotheses, arguments, theo- 
rem concluded, and name of the inference rule, into a bytecode instruction. 
In general, most of the information is discarded. For example, modus ponens 
is translated as follows: 


"MP" => let val _::h2::[] = hyps in Cut (concl h2) end 


Here, hyps is the list of hypotheses for this step. 

Creation of proof DAG The list of HOL inference steps is converted into a 
DAG format. Nodes represent either unproven theorems, or inference steps 
linked to their hypotheses. 


datatype label = COUNT of int | LEMMA of int; 
datatype node = 
THM of Thm.thm | 
INF of (string * dag ref ref list * arg list * Thm.thm) 
and 
dag = DAG of node * label; 


Each node of the DAG is given a label. During the creation of the DAG, the 
label keeps a count of how many parents a node has (that is, how often it is 
used as a hypothesis). The steps are processed from top to bottom. The first 
step, which is assumed to conclude the main theorem, gives the root of the 
DAG. At any stage there is a list of global hypotheses. This is initialised to 
the list of hypotheses of the first step. The conclusions of subsequent steps 
are then matched against the list of global hypotheses. If there is no match 
the step is discarded. Otherwise, a new node is created for this step and 
the matched hypotheses are made to point to this node. The matches are 
removed from the list of global hypotheses, and the hypotheses of this step 
are added. 

We assume that the first step encountered will be the one that concludes the 
main theorem. Since the top-down ordering will be respected by the HOL 
proof recorder, it follows that if a step cannot be added to the current proof 
DAG, then it is not needed in the proof of the main theorem. 

Creation of proof lists The DAG is traversed from the root upwards, reading 
off steps into the current lemma. If the count label is greater than one, then a 
new lemma is created, the label is changed to Lemman, where n is the current 
lemma number, and the traversal continues from there. Finally, theorem 
nodes without hypotheses are translated to axioms. Hypotheses are traversed 
from left to right, with one exception: the hypotheses of the HOL inference 
step CHOOSE are in the opposite order from those in the corresponding step 
of Coq. 

Conversion of bytecode proofs to Coq The axioms are first converted, and 
then each of the lemmas. We use two mutually recursive translation functions 
cogpp and expand_equiv, which pretty-print terms and equalities, respec- 
tively, in a Coq-readable format. These are called, in turn, by instr2coq 
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and prop2coq. The function instr2coq is the core of this stage of the con- 
version, and incorporates the specific Coq tactics which correspond to each 
bytecode instruction. For example, 


Cut t => (dup (); “Cut " * (prop2coq t)) 


The function prop2coq displays a proposition, stored as a HOL term, by dis- 
playing quantifiers for the type and term variables, and then calling coqpp. 
Only those variables which would be free at this point in the proof should 
be quantified over. This means that we have to keep track of the varia- 
ble context. In fact, there is a stack of contexts, initially set to empty for 
each lemma. The function instr2coq will alter the stack depending on the 
instruction processed. If the instruction performs an Intro1 x, then x is 
pushed onto the stack, that is, is added to the head of the top context on 
the stack. If the instruction solves the current goal, then the stack is popped. 
If, like Cut, the instruction splits the current goal into two subgoals, then 
the top element is duplicated. 


6 Conclusion 


We have presented a compact proof format which can be used both for indepen- 
dent proof checking, and as an intermediate language for the translation between 
HOL and Coq. We obtain small proofs, comparable with the results of [NL98], 
and which place minimal requirements on a checker. 

The proof format is used for translation from HOL to Coq but is fairly 
independent of either format. In general, there is not a one-to-one correspondence 
between instructions and proof steps in either HOL or Cog. 

Most appeals to classical reasoning in HOL proofs are translated to axioms 
in Coq proofs. The major difference between Coq and HOL turned out to be 
HOL’s representation of propositions as booleans, and its associated use of equa- 
lity reasoning. As commented on in [Zam97], in contrast to HOL, Coq offers 
little support for the ‘equality-like’ nature of iff. In consequence, the translation 
generates quite ugly proofs, and better results could perhaps be obtained by 
translating at a higher level. 

We have not yet reached the ideal system outlined in Section 3. The main task 
is to make the bytecode independent of HOL by having its own notion of term, 
and adding an inference rule component. The OpenMath standard (CCC99| 
could be used for term syntax. 

As well as only translating a subset of HOL proofs, it is possible that the 
generated proofs might fail because they rely on a term matching that Coq 
cannot manage. Rather than pessimistically add as much detail as could be 
needed, the translator could open a dialogue with Coq (such as suggested in 
{[BSBG98]) in which Cog is asked if it can perform a certain matching; if not, 
sufficient detail is added until it can. Another problem is that a HOL lemma 
which is an equality may be used both as a ‘true’ equality, and as an equivalence. 
Since we translate all booleans to propositions, our algorithm will only produce 
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one of these forms. Some mechanism of tagging ‘true’ booleans in the original 
proof could be used. 

We have not given any correctness result. Formalising this would involve, 
at least, giving a translation from Coq into the bytecode, so that translation 
between HOL and Coq could be judged correct if it commutes with the respective 
translations into the bytecode. However, if we are satisfied that a proposition has 
been translated correctly, then it does not matter whether we have established 
in advance that a proof will be correctly translated. This is guaranteed if the 
translated proof is accepted by Coq. 

We gave a classical translation into the non-computational Prop universe of 
the Calculus of Constructions. An alternative translation could be given which 
aimed to translate existentials to the computational form whenever possible. 

We claimed that the bytecode can be used both for proof translation, and for 
proof checking, but the checker has not been implemented, although this would 
be straightforward. Since we need to keep track of the context as the proof is 
being translated anyway, it would make sense to combine the two. 

A significant extension would be to write a translation in the reverse direc- 
tion, that is, from (a suitable subset of) Coq to HOL. Since Coq has true proof 
objects, we could start with either the lambda terms, or the proof script. In eit- 
her case, since Coq is top-down, we would need to run the machine in order to 
generate information that is implicit, and necessary for HOL’s bottom-up style. 
An even greater challenge would be to give translations to and from a third proof 
assistant. 
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Abstract. We verify within the Coq proof assistant that ML typing is 
sound with respect to the dynamic semantics. We prove this property in 
the framework of a big step semantics and also in the framework of a 
reduction semantics. For that purpose, we use a syntax-directed version 
of the typing rules: we prove mechanically its equivalence with the initial 
type system provided by Damas and Milner. This work is complementary 
to the certification of the ML type inference algorithm done previously 
by the author and Valérie Ménissier-Morain. 


1 Introduction 


The piece of work presented in this paper supplements the certification laid out 
in [6] whose purpose was to verify in Coq the soundness and the completeness 
of the ML type inference algorithm with respect to the typing rules. We now 
connect the typing rules with the dynamic semantics and verify that the type 
system ensures a strong typing: a well-typed program cannot then produce type 
errors during its execution or, according to Milner’s slogan [13], Well-typed pro- 
grams do not go wrong. Thus the whole formal development presented in both 
this paper and [6] constitutes a machine-checked certification of the different 
aspects related to the ML typing discipline. More precisely we provide for a 
functional kernel of the ML language a formalization of the type system in the 
Calculus of Inductive Constructions and also a formalization of the type inference 
algorithm well-known in the literature as the algorithm W. We prove within the 
Coq tool that W is correct and complete with respect to the typing rules. Com- 
pleteness means here that if an expression is well-typed according to the typing 
rules then W succeeds and computes the principal type of the expression. The 
formal development contains also the definition of the dynamic semantics and 
establishes the soundness property. In this study, we consider a syntax-directed 
version of the typing rules. This version is often used in the ML community but 
it is not the one proposed initially by Damas and Milner [13]. Thus we formalize 
within Coq the initial version and prove mechanically the equivalence of both 
type systems. As far as we know, no publication does mention such a complete 
mechanized certification of ML typing aspects ({16,8,22] are only related to the 
type soundness, [15] is another certification of W in Isabelle/HOL). 


The formulation and the proof of the type soundness property are intimately 
bound to the formulation of the dynamic semantics of the language. For example, 
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for ML, Milner used a denotational semantics, Tofte used a big step semantics, 
Wright and Felleisen a reduction semantics. Machine-checked proofs of type so- 
undness are often based upon a big step semantics or a reduction semantics. For 
instance, Syme [19] considers a reduction semantics to prove Java type soundness 
whereas Nipkow and von Oheimb [14] refer to a big step semantics. About ML, 
Terrasse in [21] uses a big step semantics but deals with the monomorphic case 
(Coq), Michaylov and Pfenning in [16] uses also a big step semantics but takes 
into account the polymorphic typing by substituting expressions (Elf). Lastly, in 
[3], A. Bove uses a reduction semantics but in the restricted monomorphic case 
(ALF). As far as we are aware, our machine-checked proof of ML type soundness 
is the first published one that deals with the notion of type scheme. 


The rest of the paper is organized as follows. Section 2 presents the forma- 
lization of the ML kernel we consider together with its type system (and the 
involved notions e.g. substitutions). This part is another presentation (less de- 
tailed and less technical) of the sections 3, 4 and 5 of [6]. Consequently, the 
choices done for verifying W, particularly the fact to stick to a functional im- 
plementation of W, impact on the type soundness part. Then our paper deals 
with type soundness, first in the framework of an evaluation semantics (big step) 
and then in the context of a reduction semantics (small step). The last section 
connects our formalization with the type system provided by Damas and Milner. 


We assume here familiarity with the Calculus of Inductive Constructions. We 
use version 6.1 of the Coq proof assistant [1]. In order to make this paper more 
readable, we adopt sometimes a pseudo-Coq syntax which differs slightly from 
the usual Coq syntax. Our paper provides the definitions of most concepts, the 
key lemmas but almost no proofs. The complete development is accessible on 
the Internet via http: //www.univ-evry.fr/labos/lami/specif/dubois. 


2 The Type System 


2.1 The Kernel of the ML Language 


The expressions we consider are natural number constants, identifiers (x), A- 
abstraction (Az.e), application (e e’), let binding (let z = e in e’) and recursive 
functions (Rec f z.e). 

These expressions are described in Coq as an inductive data type (expr) 
with constructors for each kind of expressions. The type ident is the type of 
the identifiers. It does not matter what it is exactly provided that the equality 
of two identifiers is decidable. 


Inductive expr: Set := 
Const: nat -> expr | Variable: ident -> expr 
| Lam: ident -> expr -> expr | Rec: ident -> ident -> expr -> expr 
| App: expr -> expr -> expr | Let_in: ident -> expr -> expr -> expr 
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2.2 Types and Type Schemes 


Types consist only of the basic type nat, type variables denoted as usual with 
Greek letters a, @ ...and functional types tr — 7’ (where 7 and 7’ are types 
too). It is encoded in Coq as: 


Inductive type: Set := 
Nat: type | Var: stamp ~> type | Arrow: type -> type -> type 


The type variables, whose type is stamp, are essentially natural numbers. It 
is also a choice close to implementations. In the following, Coq terms like (Arrow 
t1 t2) are sometimes written ti — t2. 

In order to express parametric polymorphism, it is necessary to specify type 
schemes : a type scheme, of the form Va,aj,...Qn.T, is a type with some quan- 
tified type variables. The quantified variables are called the generic variables of 
the type scheme. A type scheme without generic variables is called a trivial type 
scheme and written as V.rT. 

In order to simplify the manipulation of free and bound variables, we di- 
stinguish syntactically free and bound variables in a type scheme. Consequently 
we define inductively the type type_scheme with two different constructors for 
variables, Gen_var for bound ones and Var-ts for the free ones. 


Inductive type_scheme: Set := Nat_ts: type_scheme 
| Gen_var: stamp -> type_scheme | Var_ts: stamp -> type_scheme 
| Arrow_ts: type_scheme -> type_scheme -> type_scheme 


According to this definition, the type scheme Va.a — ( is represented by 
the Coq term (Arrow_ts (Gen_var alpha) (Var.ts beta)) where alpha and 
beta are the stamps associated to a and ( respectively. We may also write this 
Coq term as (Gen_var alpha) ->7(Var_ts beta) in a pseudo-Coq syntax. 

The choice of this representation for type schemes has an important impact 
on the formal development. It allows to define with case analysis many operations 
on type schemes and to proceed by induction in a lot of proofs. However this 
choice gives no help when a-conversion is concerned. In order to smooth away 
this difficulty, it would be interesting to use higher order abstract syntax [5] to 
represent variable bindings in type schemes. But this representation does not 
admit induction (for the moment): this is a damning drawback for us. 


2.3. Type Environments 


The type information about the free identifiers of an expression is contained in 
a type environment (environment for short when there is no ambiguity) denoted 
in the rest of the paper by I or env. Because of polymorphism, environments 
contain type schemes. Thus an environment can be considered as a partial fun- 
ction from identifiers to type schemes. In Coq we represent an environment as 
a list of associations between identifiers and type schemes. Thus the type of 
environments, type_env, is defined as list (ident * type_scheme). 
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The operation assoc_ident_in_env that finds the type scheme associated to 
an identifier (it is also written informally as [’(x)) may fail. Thus this operation 
has the type ident -> type_env -> (option type-scheme) where the type 
option defined below allows to simulate the exception mechanism. 


Inductive option [A : Set] : Set := 
None : (option A) | Some : A -> (option A) 


The extension of an environment is done by the operation add_env imple- 
mented as a simple list addition (a classical cons). We use also the informal 
notation I @ x: o (where z is an identifier and o a type scheme). 


2.4 Substitutions and Instances 


The literature provides different definitions for the notion of substitution, which 
are not all equivalent (see [11] for a survey). We consider a substitution to be 
a function s from the set of type variables to the set of types, such that the 
domain, that is {z : stamp | s(x) # z}, is finite : then s behaves like the identity 
anywhere else. 

Substitutions are undeniably fundamental objects in our mechanized veri- 
fication, but in fact they are brought indirectly by two instance relations: the 
instance relation between two types (type instance) and the instance relation 
between a type and a type scheme (generic instance). 


The type 7 is a type instance of the type 7’ if there exists a substitution 
s such that sv’ =r. 

The type 7 is a generic instance of the type scheme Va1,..., Qin.T’ if there 
exists a substitution s whose domain is {a1,...,an} such that sr’ =r. 


Consequently we distinguish two kinds of substitutions: 


— the so-called free substitutions (or substitutions), that can work only on the 
free variables of a type, a type scheme or an environment. 

— the so-called generic substitutions that can work only on the generic variables 
of a type scheme. 


The first ones are represented in Coq as association (between type variable and 
type) lists. The generic substitutions are represented by type vectors, without 
any reference to the names of the variables they are concerned with. They are 
used with the requirement that the type of the ith generic variable is located at 
the zth position in the vector. 

Many operations come with the definition of substitutions: application of a 
substitution on a type variable, a type, a type scheme, composition of substi- 
tutions, domain, range, free variables of a substitution ... These operations are 
specified in Coq in a functional style and are very close to their ML implemen- 
tation. 
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The choice of representing substitutions as association lists make some ope- 
rations (e.g. the composition) complex. The proof that the composition does 
really what it is expected is quite clumsy (about 600 lines). 


We could also represent substitutions (both kinds) by Coq abstractions of 
type stamp -> type. The application of a substitution to a variable, the com- 
position of two substitutions are then operations got for free. This kind of re- 
presentation is very attractive and often chosen in proof assistants based on 
d-calculus. It is essentially the representation chosen by Naraschewski and Nip- 
kow [15]. However the functional representation makes the implementation of 
some operations (e.g. the computation of the domain) impossible. And we need 
such an operation ! 


The notion of generic instances induces an ordering between type schemes: 
a type scheme o, is said to be more general than a type scheme o2 and written 
0, > G2 or in Coq (more_general oj 02), if and only if any arbitrary generic 
instance of o2 is also a generic instance of 01. For example, VaZ.a — 6 is more 
general than Va.a > a. 
The translation in Coq is straightforward: 


Definition more_general: type_scheme -> type_scheme -> Prop := 
[tsi, ts2 : type_scheme] 

(Vt: type, (is_gen_instance t ts2) -> (is_gen_imstance t ts1)) 
The Cog notation [x: TJe binds the identifier c of type T in e 


This ordering induces in turn a partial order between environments: I; is said 
to be more general than the environment Iz (1 > 2) if and only if [ and Ip 
are relative to the same identifiers! x1, r2...2,, and Vi € [1,n], I71(z;) > I'e(zi). 


2.5 Type Generalization 


The let construct is the only one that may introduce true polymorphic types? 
in the environment. This is done by the operation of generalization gen_type 
which builds a type scheme from a type 7 and an environment I: it turns into 
generic variables those variables appearing free in 7 but not in I’. 


gen_type T I =Vay...Qn.T 


with a; € (FV_type 7) ~ (FV_env I’) (as indicated by its name, FV-env computes 
the list of free variables of an environment). 


The most natural Coq implementation that follows from the representations 
of types and type schemes is a function (see below) defined by case analysis 
according to the type to be generalized. 


1 we impose without any loss of generality the same order for the identifiers in both 


environments 
2 that is, non trivial type schemes 
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Fixpoint gen_type:= [t: type] [env: type_env] 
Cases t of 
Nat -> Nat_ts 
| (Var v) -> if v € (FV_env env) then (Var_ts v) else (Gen_var v) 
| (Arrow t1 t2) -> (Arrow_ts (gen_type ti env) (gen_type t2 env)) 
end. 


However we do not implement the generalization exactly in this way. The 
implemented algorithm shares the same structure but incorporates a linear en- 
coding of generic variables: any occurrence of the generic variable a is encoded 
as (Gen n) if a is the nth generic variable discovered during the generaliza- 
tion. Thus the generalization of the type (a > 8) > (8 — a) with respect 
to the empty environment is the type scheme ((Gen_var 0) —7(Gen_var 1)) 
+? ((Gen_var 1) —%(Gen_var 0)). 

This encoding provides us with the following property: two type schemes ob- 
tained by generalization, identical up to the renaming of the generic variables, 
are represented by two terms (of type type_scheme) syntactically equal. Nevert- 
heless the price for this is high: because of the encoding, the lemmas involving 
generalization have an inductive step not very natural, close to an invariant (see 
[6] for more details). 


2.6 The Typing Rules and Some of Their Properties 


The typing rules are given in the Natural Semantics style [10] (see figure 1). 
They are described as inference rules expressing how to derive typing sequents 
of the form I’ + e: 7. Such a sequent is read the expression e has type 7 under 
the environment I. 

The typing rules are encoded in Coq as clauses of the inductive relation 
type_of, the translation is quite obvious here. Here is a fragment of the Coq 
specification: 


Inductive type_of: type_env -> expr -> type -> Prop := 
type_of_const: V env: type_env, Vn: nat, (type_of env (Const n) Nat) 
| type_of_var: V env: type_env, V x: ident, 
Vt: type, V ts: type_scheme, 
(assoc_ident_in_env x env)=(Some ts) -> 
(is_gen_instance t ts) -> (type_of env (Variable x) t) 
| type_of_lam: V env: type_env, V x: ident, Ve: expr, V t, t’: type, 
(type_of (add_env env x (type_to_type_scheme t)) e t’) -> 
(type_of env (Lam x e) (Arrow t t’)) 


An important property, that appears as a key property for many other pro- 
perties, states that the relation type-of is stable under substitution. 


Theorem typing_is_stable_under_substitution: 
Ve: expr, V t: type, V env: type_env, V s: substitution, 
(type_of env e t) -> 

(type_of (apply_subst_env env s) e (apply_subst_type s t)) 
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(cst) oF n: nat 


I'(x)=o, 7 is a generic instance of o 


Phe:r 


(1D) 


r@a:V.rke:7' 
(ABs) ———————_ 
TtAane:t 7’ 


Tezc:Vt@f:Vrar' be: 
(REC) 


bt Rec fxze:7 37’ 


Pre:ro7r’, DPke:t 
(APP) ———___—_—___ 
Thee':7’ 


The:7, T@a:(gen.typer I) e': 7’ 
(LET) 


[Lilet z=eine’:7’ 


Fig. 1. The typing rules 


Our Coq verification of this theorem deals explicitly with a-conversion. It 
requires for example to formalize the notion of renaming substitution and to 
verify mechanically under what conditions a type and a renamed version of it 
are generalized in the same type scheme. In that sense, our proof is fundamental 
because we really formalize and verify informal proofs that often become a little 
bit nebulous as soon as they deal with renaming. For details, see [6]. 


Some other properties about the typing rules are also required in different 
points in our whole development, for example the following property connecting 
typing sequents together with the ordering > between environments: 


Theorem typing_in_a_more_general_env: 
Ve: expr, V 1, Ig: type_env, V 7: type, 
I, > Iq -> (typeof [2 e +r) -> (type_of I; e 7) 


3 Big Step Dynamic Semantics 


The big step dynamic semantics gives a meaning to the expressions of the lan- 
guage by defining their evaluation. Again we chose to specify it in the style of 
Natural Semantics. Let us first describe the possible values and then the inference 
Tules. 
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3.1 Semantic Values and Evaluation Environments 


The values we consider here are numbers and functional values also called closu- 
res. The value of an expression depends on the values of its free variables. These 
values are recorded in an evaluation environment (environment for short when 
there is no ambiguity), written A or c in the following. 

Values and environments are mutually recursive notions. In effect, a closure 
is a pair composed of a functional expression and an environment. Two kinds 
of closures are distinguished: recursive closures <<@ Rec f z.e, A >> and non 
recursive closures << A z.e, A >>. This discrimination is also introduced by 
Boutin in his ML compiler certification (2]. We could also consider opaque closu- 
res (closures whose contents cannot be inspected): these are the values associated 
to predefined operations. They can be ignored without any loss of generality. 

Within Coq, values and environments are specified by two mutually inductive 
types val and eval_env (isomorphic to lists of pairs (identifier, value)). ° 


Mutual Inductive val: Set := Num: nat -> val 
| Clos: ident -> expr -> eval_env -> val 
| Rec_clos: ident -> ident -> expr -> eval_env -> val 
with eval_env : Set := Cnil: eval_env 
| Ccons: ident*val -> eval_env -> eval_env 


Structurally the evaluation environments are very similar to the typing envi- 
ronments. Consequently we use similar notations: A(z) (in Coq, assoc_ident_in- 
eval), A®y:v. 


3.2. The Dynamic Semantics 


The inference rules that describe the dynamic semantics are detailed in the 
figure 2. The evaluation sequent A Feyg; e <> v is read the expression e evaluates 
to the value v in the environment A. 


The inference rules implements a call by value semantics. The distinction 
between the recursive and non recursive closures implies the definition of two in- 
ference rules for the application: (APP1) when a non recursive function is applied 
and (APP2) when a recursive function is applied. 


The inference rules in figure 2 are translated into the inductive predicate 
val.of (a constructor per inference rule): 


Inductive val_of: eval_env -> expr -> val -> Prop:= 
Val_of_num: V n: nat,V c: eval_env, (val_of c (Const n) (Num n)) 
|Val_of_ident: Vc: eval_env, V i: ident, V v: val, 
(assoc_ident_in_eval ic) = (Some v) -> (wal_of c (Variable i) v) 
|Val_of_lambda: V c: eval_env, V i: ident, V e: expr, 
(val_of c (Lam i e) (Clos i e c)) 


3 We use here a separate type for environments and not the predefined lists : this 
choice is due to a limitation of the Coq version V6.1 we used. 
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|Val_of_app1: V c,c1: eval_env, V e1,e2,e: expr, V i: ident, 
Vou,v: val, (val_of c e1 (Clos ie ci)) -> (val_of c e2 u) -> 
(val_of (Ccons (i,u) c1) e v)-> (val_of c (App el e2) v) 


(cst) Atevanon 

(ID) AFevar r+ A(z) 

(ABS) Atevat A t.EO << Az.e,A>> 

(REC) Ateva Rec f 2.6 <<@ Rec f x.e, A >> 


Atevar ee <<Az.es, Ap >>, Abeva e’ Ov, 
(appl) As Oe: u Fevai ef ou 


Akeva ee Ov’ 


A teval € 0 <<@ Rec f res, Ap >>, AFevar e’ Ov, 
(app2) Ay Oz: uf: <<@ Rec f z.e,, Ay >> Fevat ef 0’ 


Atkeva ee <> vu’ 


AkevaeOv, A®L:vkeae Gu’ 
LET), ——-— 
ey Ateval let xr =eine’ ov’ 

eva 


Fig. 2. Big step dynamic semantics 


3.3. Typing Soundness or the Subject Reduction Theorem 


The proved property is that any well-typed expression of type 7 whose evalua- 
tion terminates has a value of type 7. This formulation shows that we need to 
formalize the notion of type for a value. It is immediate for a natural number 
constant but not for a closure because it is not an object of the language. Conse- 
quently we specify an inductive predicate called semantic typing” (type_of_val 
in Coq) that links a value to its type: we write lk v : 7 to indicate that the value 
v has the type r. 

Furthermore the typing/evaluation connection can only be done if the ty- 
ping environment and the evaluation environment agree (we write [+ A to 
denote that property, the corresponding Coq predicate is eval_type_env_match). 


Proving ML Type Soundness Within Coq 135 


It means the value associated to an identifier in A has the type (more precisely 
the type scheme) indicated in the typing environment I. 

A value v is assigned the type scheme o ( (sem_gen v a) in Coq) if vu has 
some type obtained as a generic instance of o. 

To define the semantic typing predicate, we follow Tofte’s approach reformu- 
lated by Leroy in [12]. Thus we use the typing rules to type a closure. Informally, 
it means that the value << A z.e, A >> has the type 7, — 7» if there exists a 
typing environment I that agrees with the evaluation environment A (‘+ A) 
and such that the typing sequent 7} X 2.e : 7; + 72 can be derived. 

The Coq formalization (given below) of the previous predicates raises no 
particular problem. Their definitions are mutually inductive. Let us notice that 
our Coq definition for [ | A adds a constraint about the order of the identifiers 
in both A and I : it must be the same. This constraint simplifies the formulation 
and the proof but is not restrictive at all. 


Mutual Inductive type_of_val: val -> type -> Prop := 
type_num: Vn: nat, (type_of_val (Num n) Nat) 
|type_closure: V i: ident, Ve: expr, Vc: eval_env, 
V env: type_env, V t1, t2 : type, 
(eval_type_env_match c env) -> 
(type_of env (Lam i e) (Arrow t1 t2)) -> 
(type_of_val (Clos i ec) (Arrow t1 t2)) 
Itype_rec_closure: similar to the previous clause 


with eval_type_env_match: eval_env -> type_env -> Prop := 
match_nil: (eval_type_env_match Cnil nil) 
Jmatch_cons: V c: eval_env, V env: type_env, V i: ident, 
V ts: type_scheme, V v: val, 
(sem_gen v ts) -> (eval_type_env_match c env) -> 
(eval_type_env_match (Ccons (i,v) c) (cons (i,ts) env)) 


with sem_gen: val -> type_scheme -> Prop := 
sem_gen_def: V v: val, V ts: type_scheme, 
(V t: type, (is_gen_instance t ts) -> (type_of_val v t)) -> 
(sem_gen v ts) 


The formulation of the typing soundness theorem in pseudo-Cogq is as follows: 


Theorem subject reduction: 
Ve: expr, V uv: val, V A: eval_env, V I: type_env, V 7 : type, 
A tevat @e & v -> Fok e:fr -> 
ForkA -> thu: 7 


The proof proceeds by induction on Ak ey; e © v. Not surprisingly, the 
let step is the most difficult one. It requires the following lemma: if a value u 
has the type T, then it also has the type scheme resulting from the generalization 
of r. 


Lemma sem_gen_gen_type: V u: val, V 7: type, V I’: type_env, 
(type_of_val u 7) -> (sem_gen u (gen_type 7 I'))., 
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Proving this lemma consists in establishing that for any generic instance 7’ 
of (gen_type 7 I’), the value u has the type 7’. The generic instance 7’ can be 
written as a type instance of r (according to a lemma required in the certification 
of W). The end of the proof rests upon the property type_val_stable_subst 
close to the stability of typing by substitutions: if u has the type 7 then u has 
also the type sv for any substitution s. 


Lemma type_val_stable_subst: V v: val, V 7: type, V s: substitution, 
(type_of_val v rT) -> (type_of_val v (extend_subst_type s 7)) 


To verify this last property, we need to use a mutual induction scheme (bet- 
ween evaluation environments and values) generated automatically by Coq. In 
fact we prove simultaneously a similar property about the typing/evaluation en- 
vironments connection: if [+ A then sf + A for any substitution s. 

Here again we use the property of preservation of the typing sequents by substi- 
tution. 


The formalization of the big step semantics together with the proof of the 
type soundness require about thirty supplementary definitions and a hundred 
new lemmas with respect to the certification of W. It is very little compared 
with the 7500 lines (91 definitions and 322 lemmas) for verifying W. 


4 Reduction Dynamic Semantics 


The dynamic semantics presented in the previous section is called big step se- 
mantics because it gives no information about the computation. It only considers 
the possible values resulting from the evaluation of an expression. It follows that 
the big step semantics cannot deal with non terminating programs. The reduc- 
tion semantics, also called small step semantics, specifies the elementary steps of 
the computation and consists of a bunch of rewriting rules. Consequently we can 
observe the reduction of an expression step by step (through a derivation) either 
for ever (if it is a non terminating expression) or until an expression in normal 
form is obtained (if the initial expression terminates). In our case, an expression 
in normal form also called a value is a constant or an abstraction (recursive or 
not). 

Using a reduction semantics to establish type soundness for languages 4 la 
ML has been popularized by Wright and Felleisen in [23] and again afterwards 
by other researchers, for example Rémy and Vouillon when they specified the 
semantics of Objective ML [18]. With such an approach a type error is modelled 
as a locked reduction, that is the impossibility to further reduce a non value 
expression. In this context, establishing type soundness consists in verifying two 
properties: the preservation of the type by reduction (also called the subject 
reduction theorem) and the non-locking of well typed programs. 

In this section we present the Coq formalization of the reduction semantics 
and prove the preservation of the type by reduction. We have also proved the 
non-locking property by establishing that any well-typed program that cannot 
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reduce anymore is a value. This last part, not developed here by lack of space, 
does not raise any specific difficulty. 

The Coq theories relative to this section add a dozen definitions and about 
thirty five lemmas. 


4.1 The Coq Specification of the Reduction Semantics 


The formalization of the reduction semantics is modular, it consists of three 
steps: 


— the definition of the subset of the expressions that are values, 

— the definition of the evaluation contexts that indicate where the reductions 
are allowed. A context is an expression that contains a hole written e. The 
notation Cle] denotes the expression obtained by placing an expression e in 
the hole of C. 

The contexts are described by the following grammar 


Cru=e|CeluvuC|letx=Cin e 


where v and e are respectively a value and an expression. 
Defining such contexts amounts to imposing a reduction strategy. For in- 
stance, the right hand side of an application can be reduced only if the left 
hand side is a value. 

— the definition of the reduction relation —~+, that specifies the elementary 
reductions. 


These different steps are modelled within Coq as follows: 


values The subset of the expressions which are values is described as the set of 
expressions such that the predicate is_value is proved to be satisfied: 


Inductive is_value: expr -> Prop := 
Cst_val : Vn: nat, (is_value (Const n)) 
|fun_val : V i: ident, V e: expr, (is_value (Lam i e)) 


contexts One originality of our work is that we explicitly formalize the notion 
of evaluation context. In fact, if many Coq contributions about A-calculus 
use a reduction relation, few of them formalize the notion of context. 
We have chosen to represent a context (of type context) as a function on ex- 
pressions. Then context is synonymous with expr -> expr. Consequently 
C{e] is translated to the application (c e) where c is the functional represen- 
tation of C. The following inductive definition is MLcontext describes the 
allowed contexts. 


Inductive is_MLcontext: context -> Prop := 
hole : (is_MLcontext ([x:expr]x)) 
lapp_left : We: expr, (is_MLcontext ([x: expr] (App x e))) 
fapp_right : Vu: expr, (is_value v) -> 
(is_MLcontext ([x: expr] (App v z ))) 
|let_left : Ve: expr, V i: identifier, 
(is_MLcontext ([z: expr] (Let_in i x e))) 
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the reduction relation —>, It contains 3 rules that express 6-reduction 
and a fourth one which specifies the relation —-+, is context compatible. 


(81) (Az.e)u —+, elv/z] 


(82) (Rec fx.e)u —>, elv/z,Rec fx.e/ f] 


(83) let =v ine —, e[v/z] 


€1 —r €2 


(CONTEXT) —————_—___- 
Cle:] —r Clee] 


The formalization of the relation —-+, requires the preliminary specification 
of the following notions: 

— free/bound identifier (inductive predicate free_ident) 

— substituting an expression e’ for x in an expression e: this operation is 
not allowed when free identifiers may be captured (we do not implement 
automatic renaming). The easiest way to implement a partial opera- 
tion in Coq * is to switch to a relational version defined inductively, 
subst_expr. Thus (subst_expr e x e’ e”) means e” is the expression 
obtained by replacing in the expression e all the free occurrences of the 
identifier x by the expression e’ (e” = e[e’/z]). 

The relation —+>, is written in Coq as an inductive predicate red with four 
constructors corresponding to the rules ((1), (G2), (83) and (CONTEXT). 


Inductive red: expr -> expr -> Prop := 
betal: V i: ident, V el,v,er: expr, 
(is_value v) -> (subst_expr e i v er) -> 
(red (App (Lam i e) v) er) 
| beta2, beta3 follow a similar construction 
| context: V el,e2: expr, V ctx: context, 
(red et e2) -> (is_MLcontext ctx) -> (red (ctx e1) (ctx e2)) 


4.2 Subject Reduction Theorem 


The subject reduction property states that reductions preserve the type of ex- 
pressions. It is given below in the case of an elementary reduction step (it can 
be easily extended to the closure of —>,. ). 


Theorem reduction_preserves_types: V e1,€2: expr, 
(e, —, e€2) -> 
(V7: type, V I: type.env, [- e: : r -> FE eg: 7) 


* The Coq functions are incurably total functions (see [7] and [17] about encoding 
partial functions). 
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The proof proceeds by case analysis according to the reduction e; —>, ee. 
In the step corresponding to the context compatibility, we finish the proof by 
case analysis on the form of the context. 
For reductions involving G-reduction a substitution lemma is the key to showing 
type preservation. 


Lemma substitution: 
V e,e1,€2: expr, V 7,71: type, V I’: type_env, V i: ident, 
Prees:fr -> 
I @i:(gen-type7 I) e: : 1 -> 
(subst_term €; ie e2) -> For eg : 1 


First of all, let us notice the formulation of the substitution lemma. Usually 
in the literature the typing hypothesis about e; assigns to the identifier 7 the type 
scheme Va, Q11,...Qn-T where @,Q1,...Qpy, are type variables not free in [’. Thus 
the lemma we prove in our formalization can be seen as a specialization of the 
usual one. However it suffices to establish the type soundness and furthermore 
a lot of technical lemmas about gen_type were already available. 

To verify this lemma, we proceed by induction on the expression e. The 
heaviest induction step concerns the abstraction because renaming of type va- 
riables is required. However numerous required properties have been established 
for verifying the property of preservation of typing sequents by substitution. 

More generally, the proof makes an intensive use of the relation > and the 
connected properties as for example the lemma typing_in_a_more_general_env 
(displayed in section 2.6). It remains to establish several supplementary lemmas 
about the type system as for example the extension lemma (see below). This 
lemma states that adding (or removing) in the environment a type information 
about an identifier non free in the expression has no impact on the conclusion 
of a typing sequent. 


Lemma env_extension : 
Ve: expr, V xz: ident, V rT: type, o: type_scheme, V I’: type_env, 
a(is_free te) -> ((@x:a)+ e: +r <> Fr e: 7) 


The proof of this lemma is based on a very specific equivalence relation 
between environments, [ ~ I’, used nowhere else. Mainly this equivalence is 
built on the two following clauses: 

Cel’ -> (i:a)i:o x (i: oI" 

Pel’ ->(i:o)\G:o I & (j:o0')(i: o)I’ wheni #7 
The first clause removes a useless information, the second one swaps the infor- 
mation for two distinct identifiers. An important lemma about that notion states 
that if the typing sequent I’ + e:7 is valid then we can also derive the sequent 
I’te:rwhenlD al”. 


5 Equivalence of Type Systems 


All the work presented until now uses a syntax-directed presentation (called 
DM’ in the paper) of the type system. Although this version is commonly used, 
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it is not the initial version (called DM in the paper) given by Damas and Milner. 
This last one is not syntax-directed. We have chosen to use the presentation DM’ 
because it is closer to W than DM and it is deterministic. This feature makes 
many proofs easier. 

The systems DM and DM’ are equivalent. We can find a paper proof in 
(9] but according to us, no formal and mechanized proof exists. We prove this 
equivalence with Coq, more precisely the soundness and completeness of DM’ 
with respect to DM. It requires seventeen supplementary definitions and seventy 
lemmas. 


5.1 Formalization of the Damas-Milner Type System DM 


The DM typing rules are given in the figure 3: the typing sequent is now 
I ot pm e: o (a type scheme is used instead of a type). We follow here 
the presentation given in [4] by Clément et al. We use also our favorite notations 
(e.g. a trivial type scheme (a type) is written V.r). 


The rules (CST), (ABS), (REC), (APP) are identical in both systems: in DM, 
they handle trivial type schemes, likened to types. The rule (TAUT) related to 
identifiers is very simple : it only extracts the type information from the envi- 
ronment. The (LET) rule does no generalization at all, but on the other hand 
the rule (GEN) may introduce some supplementary quantified variables in a type 
scheme. The rule (INST) may weaken the type scheme of an expression. 


The inductive predicate type_of_DM, illustrated partly below, formalizes the 
DM system in Coq. The relationship with the inference rules is again straight- 
forward, except for the constructor type_of _DM_gen that quantifies a list of va- 
riables instead of a unique variable in the rule (GEN). The function bind_list 
implements the binding of variables in a type scheme. Its definition is very close 
to the definition of the function gen_type: it introduces a similar linear encoding 
of the generic variables. 


Inductive type_of_DM: type_env -> expr -> type_scheme -> Prop := 
type_DM_taut: V env: type_env, V x: ident, V ts : type_scheme, 
(assoc_ident_in_env x env)=(Some ts) -> 
(type_of_DM env (Variable x) ts) 
| type_DM_inst: V env: type_env, Ve: expr, V ts,ts’: type_scheme, 
(type_of_DM env e ts) -> ts > ts’ -> (type_of_DM env e ts’) 
| type_DM_gen: V env: type_env, V e: expr, V ts: type_scheme, 
(type_of_DM env e ts) -> 
V1: Qist stamp) (are_disjoints 1 (FV_env env)) -> 
(type_of_DM env e (bind_list 1 ts)) 


| type_DM_let_in: V env: type_env, V e,e’: expr, 
V x: ident, V ts,ts’: type_scheme 
(type_of_DM env e ts) -> 
(type_of_DM (add_env env x ts) e’ ts’) -> 
(type_of_DM env (Let_in x e e’) ts’) 
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(csT) I kpm n:V.nat 


(TAUT) I’ Fpm x: T(z) 


Pkpmaz:o, o>o’ 


(INST) 
DT tpm 2:0’ 


Flpmaz:o, ag FV(P) 
(GEN). —— 
[KpmM «z:Vao 


T'@z2x:V.t Fp e:V.7' 


Ce eat gn 
DPepmAa2e:V.t 37’ 


P@ea:Vrt Of :V.r37' Fpm e:V.7' 
(REC), —— A 
Tt pm Rec f z.e:V.t 7’ 


Tom e:V.t—37', bpm e:V.t 


(APP) 
TD Epm ee’: V.7' 


UT 'pme:o, F@x:oatpm e’:oa’ 


(LET) : 


D''pm letx=eine’:oa 


Fig. 3. The DM type system 


5.2 Soundness of DM’ with Respect to DM 


We demonstrate the soundness of DM’ with respect to DM by showing that if 
we can prove with the DM’ typing rules that e has type 7 then we can prove 
with the DM typing rules that e has also the type 7 (or the trivial type scheme 
V.7). All that is done with the same environment. 


Lemma soundness_DM’_wrt_DM: V e: expr, V I’: type_env, Vr: type, 
Pre:7r -> FrRpm e: Wr 


The proof is done by induction on the expression e. Most cases are establis- 
hed by applying the corresponding rule in DM and the induction hypothesis. 
The case where e is an identifier requires to successively apply the rules (TAUT) 
and (INST). The let case needs more effort, in particular we have to rewrite 
(bind_list (gen_vars 7 I) V.r) as (gen_type 7 I). Intuitively, this rewrit- 
ing means that binding (in one time) all the variables of 7 not free in I (computed 
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by (gen_vars 7 I")) gives exactly the same result as generalizing 7 with respect 
to I’. We have here a syntactic equality because of the encoding encapsulated in 
both bind_list and gen_type. 


5.3 Completeness of DM’ with Respect to DM 


The completeness theorem states that if we can prove with the DM typing rules 
that e has the type scheme o then we can establish with the DM’ typing rules 
that e has a type 7 whose generalization provides a type scheme at least as 
general as o. 


Lemma completeness_DM’_wrt_DM: 
Ve: expr, V I: type_env, V o: type_scheme, 
Ttpm e€: 0 -> d7r.0%F e€: + A (gen_type7 I) > o 


The proof requires an induction on I + py e: o and is based on numerous 
lemmas about the > relation. For example: 


o1 > o2 -> so, > so2 where s_ is a substitution 


(are_disjoints 1 (FV_env [)) -> 
(gen_type 7 I) > o -> (gen_type 7 I) > (bind_list 1 o) 


(bind_list 1 o) > o and (gen_type 7 I) > V.r 


The proofs of these lemmas are clumsy and technical: most require to exhibit 
generic substitutions. 


6 Conclusion 


In this paper, we have specified in the Calculus of Inductive Constructions the 
abstract syntax, the type system and the dynamic semantics of a polymorphic 
functional fragment of the core ML language. We have verified one of the more 
fundamental properties, that is ML typing is sound. We have experimented two 
kinds of semantics: evaluation and reduction. 

The ML language and its type system were often extended, mainly with the 
aim of offering more flexibility to the programmer: extensible records, mutable 
values, objects, overloading ...Thus in order to validate these modifications, our 
formal development may be considered as a basis to investigate the properties 
of the new language (does it preserve or violate the properties established initi- 
ally?). Beyond the necessary checking step by step, our objective is to develop a 
formal framework (based on the Calculus of Inductive Constructions and Coq) 
that allows to define type systems a la ML and to reason about them. Intuitively, 
it requires at the same time the construction of a library of formal components 
and a methodology for composing and re-using formal pieces. 
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Isabelle/HOL 
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Abstract. Our recent, and still ongoing, development of real analysis 
in Isabelle/HOL is presented and compared, whenever instructive, to the 
one present in the theorem prover HOL. While most existing mechaniza- 
tions of analysis only use the classical € and 6 approach, ours uses notions 
from both Nonstandard Analysis and classical analysis. The overall result 
is an intuitive, yet rigorous, development of real analysis, and a relatively 
high degree of proof automation in many cases. 


1 Introduction 


The development of analysis in Isabelle/HOL [10] is based on both the reals and 
the hyperreals of Robinson’s Nonstandard Analysis (NSA) [12]. The real num- 
bers, IR, are constructed in the theorem prover using the Dedekind cuts method 
[5] and then extended to give the hyperreals (denoted by IR*) by means of the 
ultrapower construction [13,4]. Thus, when working in the hyperreals, IR can 
be viewed as a proper subfield of IR*, with the latter also containing new non- 
standard numbers such as infinitesimals and infinite numbers. By contrast, the 
development of analysis in HOL, for example, rests purely on the real numbers 
constructed using a variant of Cantor’s method developed by Harrison [6]. 

Our approach follows the HOL methodology and proceeds strictly through 
definitions. This ensures that all theory extensions are conservative, thereby 
guaranteeing consistency. Such an approach is especially suitable for a rigorous 
development of infinitesimals, as many of the attacks on these numbers in the 
past have been due to inconsistent axiomatizations. In the next sections, we give 
an overview of the various types of numbers available in Isabelle/HOL, and of 
the mechanization of some real analysis in the theorem prover. 


2 Nonstandard Numbers 


An immediate consequence of our decision to formalize nonstandard rather than 
standard analysis is the extra amount of work spent on number constructions. 
The ultrapower construction of the hyperreals, for example, first required pro- 
ving Zorn’s Lemma and developing a theory of filters and ultrafilters for Isa- 
belle/HOL. We have described details of the construction elsewhere [4], and so 
will only outline a few of the aspects relevant to this paper in what follows. 


J. Harrison and M. Aagaard (Eds.): TPHOLs 2000, LNCS 1869, pp. 145-161, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 
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2.1 On the Construction 


The construction of the hyperreals resembles to some extent that of the reals 
from the rationals using equivalence classes induced by Cauchy sequences. In 
this case, however, a free ultrafilter Ujy over the natural numbers is used to 
partition the set of all sequences of real numbers into equivalence classes. The 
free ultrafilter Ujy, whose existence is proved using Zorn’s Lemma, is a collection 
of subsets of IN with the following properties (amongst others): 


0 ¢ Uy and IN € Un X €Un => finite X 
XEUNAY €Un = XNY €UN XEUnN <= -X ¢€UnN 
XE€UNAX CY =>Y €UN 


In Isabelle, the following equivalence relation on sequences of real numbers is 
then defined: 


hyprel :: ((nat > real) * (nat > real)) set 
hyprel = {p. drs. p= (r,s) A {n. r(n) = s(n)} € Un} 


The set of equivalence classes, that is the quotient set, arising from hyprel1 is 
used to define the new type hypreal denoting the hyperreals: 


hypreal = {z::(nat > real).True}/hyprel 


Thus, it follows from the definition of hyprel that for two hyperreals to be 
equal, the corresponding entries in their equivalence class representatives must 
be equal at an infinite number of positions. This is because Ujy cannot contain 
any finite set. Once the new type has been introduced, Isabelle provides coercion 
functions — the abstraction and representation functions — that enable the basic 
operations to be defined. In this particular case, the functions 


Abs_hypreal :: (nat > real) set > hypreal 
Rep-hypreal :: hypreal > (nat > real) set 


are added to the theory such that hypreal and {z:: nat > real. True}/hyprel 
are isomorphic by Rep_hypreal and its inverse Abs_hypreal. 

The familiar operations (addition, subtraction, multiplication, inverse) and 
the ordering relation on the new type hypreal are then defined in terms of point- 
wise operations on the underlying sequences. For example, let [(Xn)] denote the 
equivalence class (i.e. hyperreal) containing (X,,) then multiplication is defined 
by 


((Xn)] + [(¥n)] = (Xn + Yn)] (1) 
or, more specifically, in Isabelle as: 


P - Q= Abs_hypreal (() X € Rep-hypreal(P). 
UY € Rep-hypreal(Q). hyprel**{An. Xn - Y n}) 
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where 
BE: € A. Biz] = {y. dz € A. y € B} (union of family of sets). 
r-"s = {y. Jaz € s. (x,y) € r} (image of set s under relation r). 


Equation (1) above is in fact proved as a theorem. All the expected field pro- 
perties of the hyperreals are easily proved since they follow nicely from the 
corresponding properties of the reals. We define an embedding of the reals in the 
hyperreals by having the following map in Isabelle: 


hypreal_of_real :: real > hypreal 
hypreal_of_real r = Abs_hypreal (hyprel~* {An::nat.r}) 


In other words, we represent each real number r in IR* by the equivalence class 
[(r,r,r,...)]. The properties of the embedding function, with respect to multi- 
plication, addition and so on, follow trivially since they are just special cases of 
the operations on the hyperreals. In what follows, we will denote an embedded 
real r by r unless we use the Isabelle embedding function explicitly. 


2.2 Numbers Big and Small 


The embedding function enables us to define the set of embedded reals SReal 
explicitly, and prove that it is a proper subfield of IR*. The proof shows that 
the well-defined hyperreal [(1,2,3,...)] (denoted by w) cannot be equal to any 
of the embedded reals as no singleton set is allowed in Uy. Once the embed- 
ding is defined and various of its properties proved, we formalize the definitions 
characterizing the various types of numbers that make up the new extended 
field: 

Infinitesimal = {x. Vr € SReal.0 <r — abs z <r} 

Finite = {z. dr € SReal. abs x < r} 

Infinite = —Finite 

With this done, a number of theorems are proved, including: 


x €Infinitesimal y € Infinitesimal x €Finite y € Finite 
x op y € Infinitesimal x op y € Finite 


where op is +, —, or x (i.e. both sets are subrings of IR*). Other Isabelle theorems 
proved include amongst many others: 


x € Infinitesimal y € Finite z € Infinitesimal x < y 
x x y € Infinitesimal Z+z<y 


In all, we prove over 250 theorems describing the properties of the hyperreals 
and their inter-relationships. In addition, we use our free ultrafilter to extend 
the natural numbers and construct the hypernatural numbers, IN*. This addi- 
tional type of nonstandard numbers provides us with infinitely large numbers 
greater than all the members of IN. The set of infinite hypernaturals is denoted 
by HNatInfinite in Isabelle. We also define the function hypnat_of_nat, an 
embedding of the natural numbers into the hypernaturals [4]. 
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3 Important Concepts from Nonstandard Analysis 


Before we can mechanize any proofs from elementary analysis, we need to define 
a few more important concepts that will provide us with an adequate framework. 
Firstly, we define the crucial infinitely close relation ~: 


rrey=nr-—y € Infinitesimal 


This is an equivalence relation about which we prove a number of properties 
al 
such as: 


[lax bcxdl]=a+cxrb+d 

[|s € SReal; b € SReal|] => (a = b) = (a = b) 

s € Finite => dlr.r€ SRealAswxr (2) 
{la © b; b € Finite] => a € Finite (3) 


Theorem (2) above is known as the Standard Part Theorem and is especially 
important as it enables us to formalize the notion of standard part. The stan- 
dard part of a finite nonstandard number is defined as the unique real number 
infinitely close to it. The actual definition in Isabelle uses the Hilbert operator 
é: 

st :: hypreal > hypreal 

st x = (er.c € Finite Ar € SRealAr = 2) 


This definition would be sufficient if we were only working in the hyperreals. 
However, since we are concerned with the formalization of real analysis, and 
want to give both the standard and nonstandard definitions of various concepts, 
we define a second version of the standard part operation. This is used to return 
a number of type real rather than an embedded real: 


str :: hypreal => real 
str x = (er.x € Finite A hypreal_of_real r = x) 


All the important properties of the standard part operator are proved. These 
include, for example: 


xz € SReal xz € Finite z€Finite y€Finite 


str =x str =x (x ~ y) = (stz =sty) 


3.1 Nonstandard Extensions 


Nonstandard extensions provide systematic ways through which sets and func- 
tions defined on the reals are extended to the hyperreals (a process sometimes 
known as the *-transform (8]). 


1 3lg. P stands for the unique existence quantifier while the Isabelle notation 
(161; ..-3¢n|] => w abbreviates the nested implication ¢1 => (...¢n => y...). 
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In particular, if f is a function from IR to IR, then it can be extended to a 
function f* from IR* to IR* by the following rule: z = [(X,,)] € IR* maps into 
y = {(Y,)| = f*() € R* if and only if {n € IN. f(X,) € Yn} € Uw. In Isabelle, 
this is rendered as: 


*f* 1: (real > real) > hypreal = hypreal 
*f* f x = Abs_hypreal (()X © Rep_-hypreal(z). hyprel**{An. f(Xn)}) 


Thus, the nonstandard extension operator provides a generic way through which, 
given a function taking standard arguments, we can define an analogous one that 
accepts nonstandard arguments. In what follows, we will denote the nonstandard 
extension of a given real function f either by f* or by its equivalent Isabelle 
notation (*f*f). We prove this important simplification theorem: 


(*f* f) (Abs_-hypreal (hyprel**{An. Xn})) = 
(Abs-hypreal (hyprel**{An. f(Xn)})) 


In other words, we have that f*[(Xn)] = [(f(Xn))]. This is useful as it allows us 
to formalize definitions and prove properties of nonstandard functions by cou- 
ching them in terms of the corresponding real functions and our free ultrafilter. 
We easily prove a number of theorems about nonstandard extensions such as 
f*(r) = f(r) and f*(x) + g*(z) = (Au. f(u) + g(u))*(z). We will come across 
others as we further outline our formalization of analysis. 

We also extend functions from IN to R: given such a function s, its *- 
transform is the function s* : IN* > IR* where s*([(Xn)]) = [(s(Xn))] for any 
((X,)] € IN*. In Isabelle, the nonstandard extension is denoted by (*fNat» s) 
and is useful in the formalization of sequences, for example (see Section 5.1). 


4 Nonstandard Versus € and 6 Formalization 


In general, one of the main advantages of the nonstandard approach is the way in 
which it simplifies the statement of many concepts from analysis. Nonstandard 
definitions tend to reflect the intuitive understanding that one has of particular 
concepts. For example, the standard formulation of uniform continuity: 


Ve. (0< € —> 56. (0 < 6AVay. (0 < |x —y| < 6 — |f(x) — fly)| < 6)) 
can be contrasted with the corresponding nonstandard one: 
Vey.c xy — f*(x) & f*(y). 


The second definition is not only concise and simple, but it also provides a rigo- 
rous, yet geometrically intuitive, characterization of the behaviour of a uniformly 
continuous function. The ¢ and 6 definition, however, not only lacks an intuitive 
reading but also leads to more complicated proofs since one has to deal with the 
existential quantifier. 
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Our work shows that the use of NSA benefits the mechanization of analysis: 
the formalization is often simpler and shorter, and the proofs benefit from a 
higher degree of automation since the alternating quantifiers of the « and 6 
approach are avoided. In our mechanization of analysis, we introduce for each 
concept both its standard and nonstandard definitions. In each case, we prove 
the equivalence of the two definitions, thereby providing us with a theorem with 
which we can re-cast properties that we are trying to prove in terms of equivalent 
nonstandard notions. 


5 Mechanized Theories 


Our mechanization process in Isabelle/HOL has been influenced significantly 
by that of Harrison in HOL [6]. Indeed, the substantial and highly focused for- 
malization of elementary analysis in HOL has often provided us with guidance 
during our mechanization process. Moreover, it has also given us some means 
of analyzing, albeit rather roughly, the benefits that the use of NSA brings to 
the mechanization of analysis. We next give an overview of some of theories 
mechanized in Isabelle/HOL. 


5.1 Sequences and Series 


We follow Harrison’s approach in HOL, and provide a relational definition for 
the limit of a sequence which we denote by ———+. Thus, in Isabelle, X ———>1 
stands for X tends to / and has the following standard definition: 


X ——>1 2 Ve. (0 < € —> (AN. Vn. N <n — abs (Xn —-I1) < €)) 


It might not be immediately obvious that this definition is intended to capture 
the idea that terms “far enough” along the sequence can get arbitrarily close to 
l. Our formalization, however, also provides a nonstandard definition of limit, 
denoted by =a that immediately captures the intuition. It is defined by the 


following simpler statement which does not involve any existential quantifiers: 


Xx eae = (VN € HNatInfinite. (xfNat* X)N ~ 1) 


Our first task, as is the case each time we introduce a new concept from 
analysis, is to prove that the two definitions are equivalent. We will not elaborate 
on the details of the proof here, but simply remark that for all of the equivalence 
proofs formalized, it is usually trickier to prove that the nonstandard definition 
implies the standard one — in each case, this requires the use of the Axiom 
of Choice, for example. This remark leads to another important point worth 
mentioning: all the equivalence proofs follow the same general pattern. This is 
not a coincidence and is related to one of the central features of NSA, known 
as the Transfer Principle which provides a context in which true statements 
about R are transformed into statements about IR* [8,7]. Details about the 
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mechanization of the equivalence proofs, and other related remarks, can be found 
in the author’s PhD thesis [4]. 

Once this important proof is mechanized, we can formally justify using non- 
standard methods to prove standard theorems of elementary analysis. As for the 
properties of sequential limits, the following theorems, for example, are all proved 
automatically since there is no need to instantiate any existential variables: 


X — >a Y—>b X——a Y—>b 
NS NS NS NS 
(An. Xn+Y¥n)——atb (An. Xn-¥n)——+a-b (4) 
X ———a »¢ xa XxX >b 
NS NS NS 
An. - Xn———> -a a=b 
NS 


Theorem (4), for instance, is easily proved using (3) and a theorem about the 
preservation of multiplication across *-transforms: 


(xfNatx f)N -(*«fNat* g)N = (*fNatx (Ar. fx-gxr))N 


Surveying the development of formalized analysis in HOL, Harrison observes 
that theorems about Cauchy sequences are the crucial ones [6]. As expected, 
these are also the important ones in our development. As in the case of sequential 
limits, we formalize a standard definition: 


Cauchy :: (nat > real) > bool 
Cauchy X = Ve. (0<¢€ —> (SM. (Vmn.M <mAM <n) 
— abs (Xm — Xn) <.e)) 


and a nonstandard one: 


NSCauchy :: (nat = real) = bool 
NSCauchy X = VM € HNatInfinite. VN € HNatInfinite. 
(xfNatx X)M = (*fNat* X)N 


The equivalence of the two definitions is proven and, with this done the main 
theorem relating Cauchy sequences to convergence: 


Al. X ——+1 <> Cauchy X 


is easily mechanized, since we can replace ———>+ by the equivalent at and 


then unfold the nonstandard definition. 


Proof Outline: 
=> part: 


xX ar ra. => X7 x1 X}, for all n,m € HNatInfinite 


=> Cauchy X 
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= part:? 


Cauchy X = > NSBseq X 
=> Xf € Finite where 92 is the infinite hypernatural [(n)| 
=— XA wl for some real number !; by theorem (2) 
=> XR ~lx X} for all infinite n 


The mechanization of this proof compares favourably with the one formalized by 
Harrison in HOL [6]. Both of them need the lemma stating that every Cauchy 
sequence is bounded but, otherwise, differ significantly in their formalization. 
Aside from dealing with the inherent difficulties associated with € and 6 proofs, 
Harrison needs to define the concept of a subsequence and several other auxiliary 
notions. He then proves the more or less involved theorem that every sequence 
contains a monotonic subsequence. One might argue that all this diverts atten- 
tion from what is actually being proved; our formalization, by contrast, is direct 
and intuitive: the mechanized proof is only 7 steps long. 


Series 


In classical analysis, despite the notation yoG a;, one does not try to inter- 
pret the expression literally as an infinite sum. Instead, one considers the sums 
of finitely many of the terms of the series, and examines the behaviour of such 
sums as an increasingly large, but still finite, number of terms are allowed [7]. 
In other words, an infinite series is defined as the limit of }>;"_) a; as n — oo. 
Using our framework, however, it is possible to use the nonstandard criterion for 
sequential convergence to define literally infinite sums. 

We first define, using Isabelle’s primitive recursion package, the standard 
notion of finite sum epi fi): 


consts sumr :: [nat,nat, (nat > real)] > real 

primrec 

sumr m0 f =0 

sumr m(Suc n) f = ifn<m then 0 else sumr mn f + f(n) 


Isabelle automatically checks whether the reduction rules for sumr satisfy a pri- 
mitive recursive definition, and then adds them to the simplifier. Thus, the prim- 
rec package provides a safe way of defining primitive recursion on datatypes in 
Isabelle/HOL [11]. All the expected theorems about finite sum are proved, mostly 
using induction followed by simplification. We will not expand on these here but 
instead consider how to define the nonstandard extension of sumr. 

The nonstandard extension of the finite sum operation cannot be obtained 
by applying the *fNat* operator directly since this can only extend functions of 
a single variable. This is not a problem, however, as we can define the extension 


? The NSA formulation of boundedness NSBseq X = Vn € HNatInfinite. X(n) € 
Finite, as proved in Isabelle, is used in this part of the mechanization. 
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in the same way that multiplication and addition, for example, were defined on 
the hyperreals: 


sumhr :: (hypnat * hypnat * (nat => real)) > hypreal 
sumhr p = (A(M,N, f).Abs_hypreal() X © Rep-hypnat M. 
UY © Rep-hypnat N.hyprel~*{Xn. sumr ((Xn), (Yn), f)})) p 


Without showing the coercion functions explicitly, this definition is simply as- 
serting that 


sumhr ([(Xn)],[(Yn)], f) = [(sumr Xn Yn f)] (5) 


This enables us to have possibly infinite hypernatural limits. Theorem (5) above 
is proved and is useful to the simplifier. We also mechanize various theorems 
which show that sumhr preserves the behaviour of finite summation [4]. Other 
interesting theorems include: 


sumhr (0, M, f) = sumhr (0, N, f) => abs (sumhr (M,N, f)) 0 


and an important theorem about the convergence of series in terms of the non- 
standard Cauchy criterion: 


ds. An. sumr 0 n f ——>s => 
(VM € HNatInfinite. VN € HNatInfinite. abs (sumhr (M,N, f)) ~ 0) 


This last theorem makes it trivial to prove that the terms of a convergent series 
tends to zero Le. 


ds. An. sumr 0 n f ——+s => f ——> 0 


In HOL, a functional definition is available that returns the limit sum to 
which an infinite series converges [6]. We follow the same approach and define a 
standard function suminf: 


suminf f = es. \n. sumr (0 n f) ——> 8 


which stands for 4 fi = s. However, we can exploit the fact that our frame- 
work offers a literal interpretation for the sum of an infinite series and prove the 
following theorem: 


suminf f = str (sumhr (0, 22, f)) (6) 


where 2 is the infinite hypernatural [(n)] defined in Isabelle. Of course, any 
infinite hypernatural can be used in the theorem above: 2 is just one of the 
simplest to define. 

The nonstandard form of suminf provides a nice device which will be useful 
to define the transcendental functions in Isabelle. With this aim in mind, and 
following Harrison’s approach in HOL, we also prove two of the most important 
convergence tests for series: the comparison and ratio tests. In both cases, the 
proofs formalized in Isabelle are purely standard although some of their steps 
use theorems derived by nonstandard means. The formalization in HOL now 
becomes a valuable guide to us: it indicates what lemmas need to be proved to 
bring our mechanization of the theorems through. 
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5.2 Continuity and Differentiation 


The first crucial concept defined is that of pointwise limit. Once again, we provide 
both the traditional € and 6 formulation: 


f—l=Ve.0 <e— (36.0<6A 
(Vz. 0 < rabs (x — a) A rabs (x — a) < 6 —> rabs (f x —1) < €)) 


and a nonstandard one: 


fla Va.c #ahcwa—s (xe flaw! 


and prove their equivalence. Both f —*—>+1 and f at stand for f having 


limit | as x approaches a. We prove properties that are similar to those of se- 
quential limits and once more with a high degree of automation. We carry out a 
simple experiment in which we compare purely ¢ and 6 proofs of limit properties 
with the corresponding nonstandard ones in Isabelle [4]. The nonstandard proof 
of the product of limits, for example, is automatic while its standard proof is 
about 15 steps long, requires a case split due to the linearity of the reals, and 
explicit instantiation of the e and 6 properties. 


Continuity 


Once the properties of pointwise limits have been formalized, they can be 
used to provide the standard definition of continuity: 


isCont fa =(f—*> fa) 


This states that a function f is continuous at a real point a if f(x) tends to f(a) 
as x tends to a. This motivates the following nonstandard definition: 


isNSCont f a = (Vz. 2 © a — (*f* f) cv & f(a)) 

An important point to note is that the formalization makes it explicit that 
the definition is referring to the embedded copies of the reals a and f(a) in 
the hyperreals. The equivalence of the two definitions follows immediately from 
that of standard and nonstandard limits. We prove automatically that the sum, 
product, and division of continuous functions are also continuous. All these are 
direct consequences of the corresponding theorems for pointwise limits. 

However, we have a second method for proving them: the theorems are all 
simple algebraic consequences of the nonstandard formulation of continuity. This 
provides a uniform approach that bypasses the limit results and provides simple, 
intuitive proofs. Moreover, it has the added advantage that, unlike the approach 
based on limits, it can easily prove that the composition of continuous functions 
is continuous i.e., 


[|isCont f a;isCont g (f a)|] = > isCont (gof) a 
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Proof. If z +a then f*(x) = f(a), and so it follows that g*(f*(x)) = g(f(a)). 
QO 
This result is proved in one step by Isabelle’s automatic tactic. Its forma- 
lization can be contrasted with the corresponding one in HOL, which required 
unfolding the limit definition and instantiating ¢ and 6 properties since the theo- 
rem cannot be derived from limit properties [6]. This hints at another useful, and 
powerful aspect of nonstandard methods in theorem proving: their simple alge- 
bra enables them to deal uniformly with a wide range of theorems. An analogous 
problem, easily avoided with NSA, can be noticed if the standard treatment is 
used to mechanize the chain rule of differentiation. 


Differentiation 


We now give a brief outline of the development of differentiation which builds 
upon the theories already described. The standard formulation states that a 
function f has a derivative d at a point a if (f(c +h) — f(x))/hodash—-> 0. 
In Isabelle, we formalize a relational definition DERIV(x) f :> d meaning ‘the 
derivative of f at x is d’ as 


DERIV(x) f :> d = (Ah. (f(z +h) — f(z))/h) ——3d 
and a corresponding nonstandard one: 


P@tM- 10) 4 @ 


NSDERIV(z) f :> d=Vh € Infinitesimal — {0}. 
where the real point x becomes an embedded value and A a non-zero infinitesimal. 
The nonstandard definition simply reflects the intuition behind the Leibnizian 
notation a and treats differentiation as a quotient operation. We prove that 
the nonstandard definition has an equivalent statement in terms of limits, and 
hence that it is equivalent to the standard definition. We also prove another 
nonstandard characterization for the differentiability of a function f at a real 
point x: 


F*(y) — f(z) 
MODERN) Diet eNO Ee a (8) 
We easily mechanize all the expected properties about the differentiation of 
simple functions and their combination. The task is simple as we can avoid 
limits in favour of simple algebraic manipulations of infinitesimals. 

Reporting on his formalization of derivatives in HOL, Harrison remarks that 
the formalization of the chain rule is not as straightforward as it might seem 
[6]. Indeed, when using the standard definition, the property cannot be derived 
using limits since these cannot be composed. This forces a direct and rather 
cumbersome proof that can potentially complicate mechanization. Using NSA, 
however, the chain rule 


[|NSDERIV(a) g :> e;NSDERIV((ga)) f :> d|] => NSDERIV(a) (fog) :>d-e 
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admits a straightforward mechanization since it has the following simple, alge- 
braic proof relying on equivalence theorem (8): 


F*(9"(x)) — Flgla)) _ F*(g*(z)) — F(g(@) 9*(x) — g(a) 


a ———— —_—___ = a d-e 


r-a g* (x) — g(a) r-a 


Our mechanization, as one sees, is a direct rendering of the classical notation 
used to denote the chain rule: 
df df dg 


dz dg dz 


and should be contrasted to the one in HOL, where Harrison has to formalize the 
Carathéodory derivative to provide the machinery necessary for simple derivation 
of the chain rule. In our work, we next use the chain rule to provide nice algebraic 
proofs of the theorems about the inverse and quotient of functions [4]. 


Overall our theories dealing with continuity and differentiation have been 
influenced quite significantly by the corresponding formalization in HOL. We 
also reproduce, for example, the HOL theorem for proof by bisection [6]. The 
lemmas set up by Harrison to prove the theorem act as an invaluable guide to 
the corresponding mechanization in Isabelle. Proof by bisection is often used in 
standard analysis to prove important theorems such as the Intermediate Value 
Theorem; it is a useful tool to have, even in a framework like ours that uses a 
combination of standard and nonstandard techniques, as this allows us an extra 
amount of flexibility when proving important theorems. 

Indeed, apart from proof by bisection, we can use the fact that many funda- 
mental results of standard real analysis have intuitively appealing proofs using 
NSA to carry out alternative formalizations in Isabelle. For example, the Inter- 
mediate Value Theorem: 


[la <b f(a) <yy< f(b);Ve.a<rAzr <b — isCont f x |] 
=—dr.a<rArsbAf(x)=y 


and the Extreme Value Theorem both have nice infinitesimal geometric proofs 
that rely on the notion of a partition [8,7,9]. This concept is formalized expli- 
citly in Isabelle and, essentially, involves splitting a closed interval into infinitely 
many subintervals of equal infinitesimal lengths. The construction of partitions 
provides a particularly useful tool in infinitesimal calculus: it can also be used, 
with a slight modification, to provide a nonstandard treatment of the Riemann 
integral, for example [9]. We will not expand further on this, and simply men- 
tion that the other significant theorems formalized in the theory include Rolle’s 
theorem and the Mean Value Theorem. 


5.3 Power Series and Transcendental Functions 


The development of power series and transcendental functions in Isabelle is to 
a large extent carried out through standard rather than nonstandard processes. 
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One of the reasons for this decision is that NSA does not seem, at first sight, to 
bring much in terms of simplification to the theory: the same techniques (e.g. 
convergence tests) are used in many cases for both standard and nonstandard 
proofs [7]. Our Isabelle development treats the HOL formalization as a useful 
blueprint; this enables us to focus on getting the main results that we want, 
namely the transcendental functions. 

We prove similar results to Harrison in HOL [6], with the same main theorem 
about the term by term differentiation of power series from Burkill and Burkill 
[3]. On the formalization of the theorem in HOL, Harrison notes that this is 
“perhaps the most difficult single proof in the whole development of analysis”. 
So, it is probably not surprising to note that this theorem produces the longest, 
and definitely the most complex, proof of our own development (the proof is 90 
steps long!). 

Once the basic properties of power series are established, we proceed to de- 
fine a few of the well-known and fundamental transcendental functions. These 
include the exponential and trigonometric functions, as well as their inverses. 
The exponential function is of central importance and is defined as the function 
sum of the power series [7]: 


exp(z) = ea 
i=0 


The formulation of the exponential function thus requires the factorial function, 
which easily formalized using Isabelle’s primrec package, and denoted by fact. 
We use the nonstandard equivalence theorem (6) to define the exp function:? 


exp :: real > real (9) 
exp x = str (sumhr (0, 9, An. x"/(fact n))) 


The ratio test is used to show the convergence, and hence boundedness, of the 
infinite sum used to define exp. With the help of theorem: 


NSBseq X ==> Abs_hypreal (hyprel~{X}) € Finite (10) 


which states that every bounded real sequence defines a finite hyperreal, we 
deduce that: 
sumhr (0, 2, An. x"/(fact n)) € Finite 


which means that the infinite sum (9) given above is well-defined and does 
produce, by the Standard Part Theorem, a real-value as result. Theorem (10) 
reflects the close relationship existing between sequences and hyperreal numbers: 
it holds because the development of the hyperreals is based on the use of sequen- 
ces of real numbers. This relation is further illustrated by the next theorems, 
also formalized in Isabelle: 


3 For clarity, we omit to show the embedding function real_of nat used to embed the 
natural number returned by fact in the reals. 
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— If (a,) converges to zero then [(a,)] is an infinitesimal. 
— If (a,) is an unbounded sequence then [(a,,)] is an infinite hyperreal. 


Returning to our exposition, termwise differentiation of the exponential function 
shows that 


< exp(x) = str (sumhr (1,2, Ax. 2"~1/(fact (n — 1)))) 


= str (sumhr (0, 2, \x. x"/(fact n))) = exp(x) 


We prove the various properties of the exponential function, such as the addition 
theorem exp(z) - exp(y) = exp(z + y) and that exp(0) = 1. We also define cos 
and sin in Isabelle. The latter is defined as follows: 


sin x = str (sumhr (0, 92, An. (if even n then 0 
else ((—1)™~?) 44¥ 2) /(fact n)) -2”)) 


The power series for the trigonometric functions are straightforwardly shown 
to converge by means of the comparison test. Properties like sin(O0) = 0 and 
cos(0) = 1, and the derivative results for sin and cos are all easily proved with 
the help of Isabelle’s simplifier. 

The way the properties of the various transcendental functions and their 
respective inverses are mechanized is very much through a re-construction of the 
HOL proofs in Isabelle. We even make use of a technique invented by Harrison 
to prove identities of the form Vz. f(x) = g(x) [6] and derive the results that 
we need. We ease our mechanization by implementing a simple tactic that tries 
to solve goals involving derivatives of functions automatically through backward 
proof steps followed by simplification. 

The formalization of this last theory takes the formal development of ana- 
lysis in Isabelle to a stage which, we hope, makes it suitable for interesting 
applications. We still have much of the theory of integration to develop, and 
the nonstandard approach promises to be useful for this. In what follows, we 
describe a few further aspects of our work that become possible because we have 
a formalization of NSA rather than purely standard analysis. 


6 Infinitesimal and Infinite Reasoning 


As expected, NSA in Isabelle provides a nice framework in which one can prove 
theorems involving the infinitely small rigorously. Very often, one sees in text- 
books statements such as 


sin(@) = 0 where @ is infinitely small 


This is rarely given any further justification: the reader needs to rely on her 
knowledge of the sine function and on her intuition about what infinitely small 
means to see that the statement is indeed plausible. In NSA, however, an infini- 
tely small number becomes a well-defined entity which can be manipulated like 
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other more familiar types of numbers. As a result, many statements, including 
the one above, can then be proved in a rigorous yet intuitive way. We now give 
a brief proof of the statement, as mechanized in Isabelle. 

Using the nonstandard formulation of derivative formalized by theorem (7), 
and the standard results that 


cos(0) = 1, sin(0) = 0, and DERIV(z) Az. sin(x) :> cos(z), 


we can easily prove that sin*(6) ~ @ for all infinitesimal 6. 
Proof: 


if 9 = 0: This is trivial since ~ is reflexive. 
else if 6 4 0: Since DERIV(z) Az. sin(x) :> cos(z), for all z, we have that 


DERIV(0) Az. sin(x) :> cos(0) 
sin* (0 + h) — sin(0) 


=> Vh € Infinitesimal — {0}. ee 1 
sin*(0 + 8) — sin(0) 
a 
6 
sin* (6) 
x1 
7 9 


=> sin*(0) = 0 


One important point to note is that we made use of theorem (3) to reach the 
final step. In a similar fashion, we also prove that cos*(@) = exp*(@) ~ 1 and, 
interestingly, that tan*(7/2 + @) € Infinite, for all infinitesimal 6. 


The above is just one possible example of formal reasoning about the in- 
finitely small. The framework also enables us to investigate infinite processes, 
and check that their (asymptotic) results are infinitely close to the (ideal) ones 
expected. We can illustrate this point with an example. 

Newton’s iteration method for root approximation can be used to define the 
square-root operation on positive reals greater than zero. In Isabelle, assuming 
that real a > 0, we define the following function: 


consts square_root :: (nat, real] > real 

primrec 

square-root 0a=a+l 

square-root (Suc n) a = (square_root n a+ a/square_root n a)/2 


The square-root function defined above is expected to produce closer and clo- 
ser approximations to the desired square root, that is, the sequence of values 
computed should converge towards the square root. Cast within our nonstan- 
dard framework, this means that the value computed by the function after an 
infinite number of steps is infinitely close to the square root. This motivates the 
following definition in Isabelle: 


sqrt :: real => real 
sqrt a = str ((«fNat* (An. square_root n a)) 92) 
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The square root function is said to be defined as the hyperfinite approximation 
of the Newton method iteration: we make use of the infinite hypernaturals and 
consider the value computed at the infinite step 92. 

It is easy to prove that the sequence An. square_root n a is bounded for a 
given a (e.g., by a+1), and hence, that *«fNat* (An. square-root n a) returns a 
finite hyperreal at the 92 step which, by the Standard Part Theorem, is infinitely 
close to the real result required. We show that our formalization does define the 
square root function by proving that: 


0 <a => ((*fNat* (An. square_root n a)){2)* = hypreal_of_real a 


that is: 
0 <a=> sqrt a- sqrt a = hypreal_of-_real a 


The various properties of the square root function are all easily proved using the 
definition. This formalization was done as an experiment in Isabelle. The main 
formalization of square root is carried out as a special case of n-th roots, whose 
existence is one of the important theorems in our theory of sequences. 


7 Conclusion 


The automated theorem proving community has shown some rather limited in- 
terest in NSA so far. Ballantyne and Bledsoe implemented a prover using non- 
standard techniques in the late seventies [1]. Their work basically involved sub- 
stituting (without proof) any theorem in the reals IR by its analogue in the 
extended reals IR* and proving it in this new setting. Even though the prover 
had many limitations, and the work was just a preliminary investigation, the 
authors argued that through the use of nonstandard analysis, they had brought 
some new and powerful mathematical techniques to bear on the problem. More 
recently, Beeson developed a restricted axiomatic version of NSA using the logic 
of partial terms and used it to ensure the correctness of symbolic computations 
in his calculus system Mathpert [2]. In both work, the properties of the infinitely 
close relation, standard parts, infinitesimals and so on are asserted as axioms. 
Our strictly definitional approach to the mechanization of NSA, in effect, veri- 
fies the axioms that were built into both Ballantyne and Bledsoe’s prover and 
Beeson’s Mathpert. 

In this work, we always formalize both the standard and nonstandard definiti- 
ons of concepts and then prove their equivalence as theorems. These equivalence 
results are essential because they allow us to provide legitimate nonstandard 
proofs of familiar properties. In current mathematical practice, the standard de- 
finitions are the ones that are in widespread use, so having just nonstandard 
definitions without any justification might be viewed as objectionable. However, 
with the success and widening acceptance of NSA, it may be that in the future 
the so-called nonstandard definitions will become the established ones. 

One of the main considerations during our formalization was to provide theo- 
ries that would be on a par to those developed in HOL. The extensive HOL 
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development has provided much of the initial motivation and subsequent drive 
necessary for our own work. Our theory of transcendental functions is the one 
that owes most to Harrison’s salient work, as it stands almost wholly on the 
shoulders of HOL. 

Our hope is for Isabelle to become a powerful framework for the mechaniza- 
tion of non-trivial problems involving continuous mathematics. The nonstandard 
numbers, just like the reals, have application in floating point error analysis, for 
example. Indeed, NSA has been used to develop theoretical techniques —— the 
so-called asymptotic methods — for the formal verification of mathematical soft- 
ware [14]. With the advanced framework now established in Isabelle, this is an 
interesting and promising area of application which we hope to investigate next. 

As a final note, we remark that the effort of formalizing both standard and 
nonstandard tools is not simply an exercise: potential users will have the freedom 
either to stick with classical (standard) techniques, use nonstandard ones, or a 
combination of both. We hope that they will experience the benefits that we feel 
infinitesimals and other nonstandard numbers can bring to mechanization. 
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Abstract. We modify the reflection method to enable it to deal with 
partial functions like division. The idea behind reflection is to program 
a tactic for a theorem prover not in the implementation language but in 
the object language of the theorem prover itself. The main ingredients 
of the reflection method are a syntactic encoding of a class of problems, 
an interpretation function (mapping the encoding to the problem) and a 
decision function, written on the encodings. Together with a correctness 
proof of the decision function, this gives a fast method for solving pro- 
blems. The contribution of this work lies in the extension of the reflection 
method to deal with equations in algebraic structures where some func- 
tions may be partial. The primary example here is the theory of fields. 
For the reflection method, this yields the problem that the interpretation 
function is not total. In this paper we show how this can be overcome by 
defining the interpretation as a relation. We give the precise details, both 
in mathematical terms and in Coq syntax. It has been used to program 
our own tactic ‘Rational’, for verifying equations between field elements. 


1 Introduction 


We present a method for proving equations between field elements (e.g. real 
numbers) in a theorem prover based on type theory. Our method uses the re- 
flection method as discussed in [6,5]: we encode the set of syntactic expressions 
as an (inductive) data type, together with an interpretation function [—] that 
maps the syntactic expressions to the field elements. Then one writes a ‘nor- 
malization’ function NV that simplifies syntactic expressions and one proves that 
this function is correct, i.e. if M(t) = q, then the interpretations of t and q ({¢] 
and [q]) are equal in the field. Now, to prove an equality between field elements 
a and 8, one has to find syntactic expressions t; and t2 such that N’(t,) = M(t2) 
and [t,] is @ and [tg] is b. This method has been applied successfully [2] to ring 
expressions in the theorem prover Coq, where it is implemented as the ‘Ring 
tactic’: when presented with a goal a = b, where a and 6 are elements of a ring, 
the Ring tactic finds the underlying syntactic expressions for a and b, executes 
the normalization function and checks the equality of the normal forms. 

The application of the reflection method to the situation of fields poses one 
big extra problem: syntactic expressions may not have an interpretation, e.g. 5: 
So, there is no interpretation function from the syntactic expressions to the actual 
field ([—] would be partial). The solution that we propose here is to write an 
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interpretation relation instead: a binary relation between syntactic expressions 
and field elements. Then we prove that this relation is a partial function. The 
precise way of using this approach is discussed below, including the technical 
details of its implementation in Coq. For the precise encodings in Coq see [4]. 


The Reflection Method in General 


Reflection is the method of ‘reflecting’ part of the meta language in the object 
language. Then meta theoretic results can be used to prove results from the 
obejct langauge. Reflection is also called internalization or the two level approach: 
the meta language level is internalised in the object language. The reflection 
method can (and it has, see e.g. [7|) be used in general in situations where 
one has a specific class of problems with a decision function. It is also not just 
restricted to the theorem prover Coq. If the theorem prover allows (A) user 
defined (inductive) data types, (B) writing executable functions over these data 
types and (C) user defined tactics in the meta language, then the reflection 
method can be applied. The classes of problems that it can be applied to are those 
where (1) there is a syntactic encoding of the class of problems as a data type, 
say via the type Problem, with (2) a decoding function [—] : Problem — Prop 
(where Prop is the collection of propositions in the language of our theorem 
prover), (3) there is a decision function Dec : Problem —> {0,1} such that (4) one 
can prove Vp:Problem((Dec(p) = 1) — [[p]). Now, if the goal is to verify whether 
a problem P from the class of problems holds, one has to find a p: Problem such 
that [p] = P. Then Dec(p) (together with the proof of (4)) yields either a proof 
of P (if Dec(p) = 1) or it ‘fails’ (if Dec(p) = 0 we obtain no information about 
P). Note that if Dec is complete, i.e. if Vp:Problem((Dec(p) = 1) © [p]]), then 
Dec(p) = 0 yields a proof of -P. The construction of p (the syntactic encoding) 
from P (the original problem) can be done in the implementation language of 
the theorem prover. Therefore it is convenient that the user has access to this 
implementation language; this is condition (C) above. If the user has no access 
to the meta language, the reflection method still works, but the user has to 
construct the encoding p himself, which is very cumbersome. 

In this paper we first explain the reflection method by looking at the example 
of numbers with multiplication. We point out precisely which are the essential 
ingredients. Then we extend the example by looking at numbers with multi- 
plication and division. Here the partiality problem arises. We explain how the 
reflection method can be applied to this example. This is an illustration of what 
we have implemented in Coq: a tactic for solving equations between elements of 
a field (a set with multiplication, division, addition, subtraction, constants and 
variables). The tactic has been applied successfully in a formalization of real 
numbers in Coq that we are currently working on. 


2 Equational Reasoning Using the Reflection Method 


We explain the reflection method by the simple example of numbers with mul- 
tiplication. Suppose we have F': Set, -: F>F-—F,1: F and an equivalence 
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relation =f on F (either a built-in equality of the theorem prover or a user 
defined relation) such that 

(i) =r is a congruence for - (ie. ifa =r 6 and c= rd, thena-c=pb-d), 

(ii) - is associative and commutative, 
(iii) 1 is the unit with respect to -. 

Phrased differently, (F,-,1) is an Abelian monoid. When dealing with F’, we will 
want to prove equations like 


(a-c)-(1-(a-b)) =p (a-a)-(b-c) (1) 
where a,b,c are arbitrary elements of F. To prove this equation in a theorem 


prover each of the properties (i)—(iii) above has to be used (several times). It is 
possible to write a ‘tactic’ in the theorem prover that does just that: 


Apply each of the steps (i)-(iii) to rewrite the left and right hand side 
of equation (1) until the two sides of the equation are literally the same. 


Obviously this is not a very smart tactic (e.g. it does not terminate when the 
equality does not hold) and of course we can do better than this by applying 
(i)—(iii) in a clever order. For the case of Abelian monoids, this can be done by 
rewriting all terms into a normal form which has the shape 


Qy-(aq-(...-(Gn-1)...)) 


where n > 0 and aj,...,@n are elements of F that can not be decomposed, 
listed in alphabetic order. So a; may be a variable of type F or some other term 
of type F, that is not of the form —-— or 1. A tactic, which is written in the 
meta language, has access to the code of a;, hence it can order the a; according 
to some pre-defined total order, say the lexicographic one. (Note that a normal 
form as above can not be achieved via a term rewrite system, because we have 


to order the variables.) So, a more clever tactic does the following. 


Rewrite the left and right hand side of equation (1) to normal form and 
check if the two sides of the equation are literally the same. 


Following [5], there are three ways to augment the theorem prover with this 
proof technique for equational reasoning. 


1. Add it to the primitives of the theorem prover, 

2. Write (in the meta language) a tactic, built up from basic primitive steps, 
that performs the normalization and checks the equality. 

3. Write a normalization function in the language of the theorem prover itself 
and prove it correct inside the theorem prover; use this as the core of the 
tactic. 


The first is obviously undesirable in general, as it gives no guarantee that the 
method is correct (one could add any primitive rule one likes). The second and 
third both have their own pros and cons, which are discussed extensively in [5]. 
It is our experience (and of others, see [2]) that especially for theorem provers 
based on type theory, the third method is the most convenient one if one wants 
to verify a large numbers of problems from one and the same class. We will 
motivate why. 


Equational Reasoning via Partial Reflection 165 


Reflection in Type Theory 


We still work with the Abelian monoid (F,-,1) from before and we want to verify 
equation (1). The equality on this monoid will be denoted by =r, which may 
be user defined or not, as long as it is an equivalence relation and a congruence 
for -. Note that there is also the definitional equality, built-in into Coq. This is 
usually denoted as =g5,, as it is generated from the literal (a-) equality by ad- 
ding the computation steps @, 6 (for unfolding definitions) and « (for recursion). 
Definitional equality is decidable and built into the type checker; it is included 
in the equality =, (if two terms are definitionally equal, they are equal in any 
respect). 
We introduce an inductive type of syntactic expressions, E, by 


BE:=VIC|E*E 
where V is the type of variables, let’s take 
Vi=N 


and C is the type of constant expressions, containing in this case just one element, 
u. In type theory (using Coq syntax) the definition of V and EF would be as 
follows. 


Definition V : Set := nat. 


Inductive E : Set := 
evar : V->E 
| eone : E 
| emult : E->E->E. 


To define the semantics of an expression e : F, we need a valuation p: V > F 
to assign a value to the variables. The interpretation function connecting the level 
of the syntactic expressions F and the semantics F' is then defined as usual by 
recursion over the expression. 


[-lL: BF 


In Coq syntax the interpretation function I is defined as follows, given the Abe- 
lian monoid <F, fmult, fone>: 


Variable rho : V->F. 


Fixpoint I [e:E] : F := 
Cases e of 
(evar v) => (rho v) 
| eone => fone 
| (Cemult et e2) => (fmult (I e1) (I e2)) 
end. 
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Now we write a ‘normalization function’: 
N: EOE 


that sorts variables, removes the unit (apart from the tail position) and associates 
brackets to the left. We don’t give its encoding N : E -> Ein Coq, but give the 
following examples. 


N ((vo * u) * (v1 * ¥2)) =Ber (Vo * (V1 * (v2 * U))), 
N((v2 * U9) * V1) =Bs. (Vo * (v1 * (v2 * u))). 


The equality =g,, is the internal (computational) equality of the theorem prover: 
no proof is required for its verification; a verification of such an equality is 
performed by the type checker. 

We prove the following key lemma for the normalization function. 


normcorrect: fe], =r [NV(e)]p 
In Coq terminology: we construct a proof term 
normcorrect : (rho: V -> F)(e:E)((I rho e) = (I rho (N e))). 


The situation is depicted in the following diagram; normcorrect states that the 
diagram commutes. 


N 


[-] [-] 


Ff ———__ F 
=F 


Solving equation f =r f’ with f and f’ elements of F now amounts to the 
following. 


— Find (by tactic) e,e’ and p with 
[elo =as. f and [e'], =ps. f’ 
— Check (by type checker) whether 
N(e) =g6. N(e') 
The proof of f =r f’ is then found by 


f =Bbt [el “=F [V(e)] = Bde W(e)]p =F [e’le = Bde f' 
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from normcorrect for e and e’ and trans of =p. In a diagram: 


N N 


E E E 


[-] [-] [-] 
F————_. F ——__—__ F 
=P =P 
In Coq this means that we have to construct a proof term of type 
f= f? 


This is done from normcorrect using the proofs of symmetry and transitivity 
of =r, sym and trans. 


sym : (x,y:F) (x = y) -> (y =x). 


trans : (x,y,z2:F) (x = y) -> (y =z) -> (x = 2). 

The crucial point is that 

(normcorrect rho e) : ((I rho e) = (I rho (N e))). 
(normcorrect rho e’) : (CI rho e’) = (I rho (N e’))). 


can only be fitted together using trans, when (N e) and (N e’) are Géu- 
convertible. In that case we find that (I rho (N e)) is $de-convertible with 
(I rho (N e’)) as well, so if we call that g by defining: 


g := (I rho (N e)) 
then we find that: 


(normcorrect rho e) : (f = g). 
(normcorrect rho e’) : (f? = g). 


So using this, we can construct a proof term 


(trans f g f’ (normcorrect rho e) (sym f’ g (normcorrect rho e’))) 
>: f= f?. 


The important points to note here are 

(1) This proof term of an equality has a relatively small size, compared to a 
proof term that is spelled out completely in terms of congruence (of =r w.r.t. -) 
and reflexivity, symmetry and transitivity (of =). The terms refl, sym, trans, 
and normcorr are just defined constants. The terms rho, e and e’ are generated 
by the tactic; rho being of size linear in f and f’ with a rather small constant. 
A proof term that is completely spelled out has a polynomial size in f and f’. 

If we unfold the definitions, we observe that the bulk of the proof term is in 
normcorr. This will be rather large but it only has to be extended with a part 
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of — roughly — the size of the input elements themselves. So, then the proof term 
is still linear in the size of the input terms. 

(2) Checking this proof term (i.e. verifying whether it has the type f = f’) 
can in general take rather long. This is because type checking now involves 
serious computation, as we use the language of the theorem prover as a small 
programming language. The bulk of the work for the type checker is in verifying 
whether (N e) and (N e’) are G61-convertible. 

We compare this to the approach of using a tactic that is written completely 
in the meta laguage. This tactic will do roughly the same thing as our reflection 
method: reduce expressions to normal form and generate step by step a proof 
term that verifies that this reduction is correct. Checking such a proof term will 
take about the same time. Some increase in speed may only be gained if we check 
a user generated proof term, because this will (in general) avoid reducing to full 
normal form (assuming the user sees the possible ‘shortcuts’). 

(3) Generating the proof term is very easy, both for the reflection method 
as for the tactic written in the meta language. The tactics generate the full 
proof term without further interaction. Note that a completely user generated 
proof term of an equality (which may be fastest to type check, see above), is not 
realistic. 

Here we also see why the reflection approach is particularly appealing for 
theorem provers based on type theory: one has to construct a proof term, which 
remains relatively small using reflection. Moreover, these theorem provers pro- 
vide the required programming language to encode the normalization and inter- 
pretation functions in. 

Looking back at the example from the beginning, encoded in Coq, we have 
as goal 


Goal 
((fmult (fmult ac) (fmult fone (fmult a b))) 
= (fmult (fmult a a) (fmult b c))). 


Now the tactic generates 


Cemult (emult (evar 0) (evar 2)) 
(emult eone (emult (evar 0) (evar 1)))). 
(* the e : E * ) 
(emult (emult (evar 0) (evar 0)) (emult (evar 1) (evar 2))). 
(* the e’ : E * ) 


and a function rho : V -> F which is defined in such a way that 


(rho (evar 0)) =a 
(rho (evar 1)) =b 
(rho (evar 2)) =c 


Then it constructs a term as above, 


(trans f g f’ (mormcorrect rho e) (sym f’ g (normcorrect rho e’))) 
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where g is (I rho (N e)). Note that (I rho (N e)) =gs, (I rho (N e’)) 
=Bde 


(I rho (emult (evar 0) (emult (evar 0) 
(emult (evar 1) (emult (evar 2) eone))))) 


=gs. (fmult a (fmult a (fmult b (fmult c fone)))). This term is given to 
the type checker. If it type checks with as type the goal, the tactic succeeds (and 
it has constructed a proof term proving the goal); if the type check fails, the 
tactic fails. 


3 Reflection with Partial Operations 


We explain partial reflection by adapting the example to include division. We 
view division as a ternary operation: 


a+b//p with p a proof of b ¥pr 0. 
This is very much a type theoretic view. One may alternatively write 
a+b for b€ {z|z Fr 0}, 


but note that this also requires a proof of b #r 0, before + can be applied to it. 

As a side remark, we note that we use the principle of irrelevance of proofs 
when extending the equality on F' to expressions of the form a ~ b // p. That is, 
if p and p’ are both proofs of b #r 0, then (a +6 // p) =F (a +6 // p’). In our 
encoding in Coq, this is achieved by representing {z | z #, 0} by the type Pos 
of pairs (b,p) with p: (b #r 0) with the equality on Pos the one inherited from 
F. Then we let = be a function from F x Pos to F. 

If we extend our structure with a zero element and a division operator, like 
in fields, we encounter the problem of undefined elements. These cause trouble 
in various places. First of all, there is the question of which syntactic expression 
one allows: if 1/0 is accepted, which interpretation does it have (one has to 
choose one). This is of course related to the question whether the theorem prover 
allows to write down ¢ (whatever its meaning may be). The second problem is 
that a naive normalization function might rewrite 0/(0/v) to v (just because 
a/(a/v) = z/x*v = 1*v = v). But then, $ = a, which is undesirable. Note that 
the ‘division by 0’ problem can occur in a more disguised form, e.g. in $ = a, 
with y a variable, which is correct under the side-condition that y tp 0. So, 
it seems that, when normalizing an expression e, one would have to take the 
interpretation [e], into account (and the interpretation of subexpressions of e) 
to verify that the normalization steps are correct. 

We have solved the problems just mentioned by 


— Allowing syntactic expressions (like 1/0) that have no interpretation. So 
{—], is defined as a relation, for which it has to be proved that it is a partial 
function. 

— Writing the normalization function N in such a way that, if expression e has 
an interpretation, then expression N(e) has the same interpretation as e. 
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Syntactic expressions We now define the inductive type of syntactic expressions, 
E, by 
E:=V|C\|E*xE|E/E 


where V is again the type of variables, for which we take V ::= N again. C is 
the type of constant expressions, now containing a zero, z, and a one expression, 
u. In type theory (using Coq syntax): 


Inductive E : Set := 
evar : V->E 

eone : E 

ezero : E 

emult : E->E->E 
ediv : E->E->E. 


Note that E doesn’t depend on F and p; we have ‘light’ syntactic expressions 
(without any semantic information). This implies that 1/0 is allowed in E: it is 
a well-formed expression. 


Interpretation relation The semantics of an expression is now not given by a 
function but an interpretation relation: 


[pC EXF 


Again, we need a valuation p: V — F to assign a value to the variables. The 
interpretation relation can then be defined inductively as follows. 


Un lp f iff p(n) =F f, 
ul, f iff f =r 1, 
zi f iff f =F 0, 
(€1 * €2) [fp f iff Shi, fo € F (1 Ip fi) A (€2 Ip fo) A(f =F fi: fa), 
(e1/e2) lp f iff Si, fo © F (er Ip fa) A (€2 Ip fa) A (fo Ar 9) A (f =P fi + fe). 


In Coq let there be given a structure <F, fmult, fdiv, fone, fzero>, 
with 


fdiv: (x,y:F)(“(y =_F fzero))->F 


and the other operations and the equality as expected. The inductive definition 
of |p is as follows. 


Inductive I : E->F->Prop := 
ivar : (n:V)(f£:F) ((rho n) = f) -> (I (evar n) f) 
| ione : (f:F) (fone = f) -> (I eone f) 
| izero : (f:F) (fzero = f) -> (I ezero f) 
| imult : (e,e’:E)(f,f’,f??:F) 
((fmult f f’) = f’’) -> (Ie f) -> (Ie’ f’) 
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-> (I (emult e e’) f’’) 
| idiv : (e,e’:E)(f,f’,f’’:F)(nz:~(f’ = fzero)) 
((fdiv f f’? nz) = f’’) -> (I ef) -> (Il e’ f’) 
-> (I (ediv e e’) f’’). 


Note that we do not just let ione : (I eone fone), but take fone modulo the 
equality on F, and similarly for the constant, the variables and the two operators. 
This is because I should be a partial function modulo the equality on F. In more 
technical terms: correctness of normalization can only be proved with this version 
of I. 


Normalization and correctness The ‘normalization function’: 
N: ESE 


now brings the expressions that have an interpretation in one of the following 
two normal forms 


(ui * (Uo *... (Un *U)...)) / (wi * (we *... (Wm *u)-..)), 


z/u, 


with vj,...,;Un,W1,..-Wm variables and the two lists v1,...,Un and wj,...Wm 
disjoint. So, NM creates two mutually exclusive lists of sorted variables, one re- 
presenting the enumerator and one representing the denominator. The sorting 
of these lists is the same as for multiplicative expressions. In case MN encoun- 
ters a z in the enumerator, the whole expression is replaced by z/u (which has 
interpretation 0). For the expressions that do not have an interpretation (those 
e € E for which there are no p: V-+F,f € F with e |, f), the normalization 
function can return anything. 

We don’t give the encoding N : E -> E in Coq, but restrict ourselves to 
some examples. 


N (vo/(v1/v3)) * ¥1 =ger (vo * (v3 * u))/u, 
N ((vo/(v1 * v2))/(u3/v2)) =6. (vo * u)/(vi * (v3 * uu). 


We can understand the way NV actually works as follows. 


1. From an expression e, two sequences of variables and constants are created 
5; and Sg, the first representing the enumerator and the second the denomi- 
nator. The intention is that, if e has an interpretation, then s;/s2 has the 
same interpretation. 

2. These two sequences are put in normal form, following the normalization 
procedure for multiplicative expressions. 

3. Variables that occur both in s; and sg are canceled, units are removed and 
$1 is replaced by z if it contains a z. 
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Note that we tacitly identify a sequence s; with the expression that arises from 
consecutively applying * to all its components. This is also the way we have 
implemented it in Coq: we do not use a separate list data structure, but en- 
code it via * and u. On these lists, we define an ‘append’ operation, which we 
denote by @. So, if s; and s2 denote two expressions in multiplicative normal 
form, s;Q@sz is the multiplicative normal form of s; * s7. As a matter of fact, V 
doesn’t do each of these steps sequentially, but in a slightly smarter (and faster) 
way. 

In proving the correctness of NV, one has to preserve the property that all 
denominators are # 0. In that, the first step is the crucial one. (The second step 
is only a reordering of variables; one has to prove that this reordering preserves 
the #r 0 property, which is easy. In the third step one has to prove that ~r 0 
is preserved under cancellation, which is the case: ifa-b #p 0, then a ## 0.) 
The first step has a nice recursion: if V(e) = (s1, 2) and N(e’) = (s},s), then 


N(e * e’) := (s1Q@s}, s2@s}), 
N (e/e’) := (81@s9, $2@s}). 


Now, if e x e’ has an interpretation, then (by induction) s2 and s4 have an 
interpretation different from 0 and hence the interpretation of s2@s/, is different 
from 0. Similarly, if e/e’ has an interpretation, then (by induction) se, s{ and s5 
have an interpretation different from 0 and hence the interpretation of so@s{ is 
different from 0. 


This is also how the correctness proof of NY works: N itself doesn’t have to 
bother about the interpretation of the expressions it operates on, because it is 
written in such a way that, the fact that e has an interpretation implies (in a 
rather simple way, sketched above) that M/(e) has an interpretation (which is 
the same as for e). 

Again we note that N cannot be found as a term rewriting system, for one 
because it orders variables, but more importantly because it only works properly 
for expressions that have an interpretation. We can use this information, because 
the expression we start from is derived from an existing f : F', which is well- 
defined (otherwise we couldn’t write it down in the theorem prover). So, we 
already know that the first e has an interpretation (namely f) and by virtue of 
the construction of NV, this property is preserved. 


We prove the following key lemmas. 


normeorrect : elo f => N(e) lp f 
extensionality: (el, f)A(elpf) > f =r f'. 


Extensionality states that ||, is really a partial function (w.r.t. the equality 
= F). 
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Reflection The reflection method for solving f =r f’ is now: 


— find (by tactic) e, e’ and p with 
el f and e’], f’ 


— construct (see below) proof terms for these two statements 
— check (by type checker) whether 


N(e) =g8. N(e’) 
(=g6. means (6du-convertible) 


The proof of f =p f’ is then found by: 
elb>f >N(e) I> f 


e' Ip fi = Ne’) I f’ 


from normcorrect (applied to (e, f) and (e’, f’), respectively) and extensionality 
(applied to (V(e), f, f’)). 

Just as in the case for reflection in Section 2, a precise proof term can be 
constructed, which type checks with type f =r f' if and only if these terms are 
can be shown to be equal in the equational theory. In the next Section we will 
exhibit such a proof term. The main work in type checking this proof term lies 
in the execution of the algorithm NV (but this is done by the type checker). 

One problem remains. As we now have an interpretation relation, there arise 
some proof obligations: it is not just enough to find encodings e and e’ of f and 
f’; we have to prove that they are encodings indeed. That is, we have as new 
goals 


>f=f' 


el f and e’ |, f’ 


Of course, we don’t want the user to have to take care of these goals; the tactic 
should solve them. This problem is dealt with in the next Section. 


4 Proof Loaded Syntactic Objects 


At the second step of the partial reflection method, we need proofs of e |, f. 
One way is to let the tactic construct these; so from f : F’, the tactic extracts 
both e : E and p and a proof term p with p: e |, f. This is possible, but it is 
not what has been implemented. We have chosen to have one data type for both 
expressions and proofs. The strategy for doing so (and which fits very well with 
the type theoretic approach) is to create syntactic expressions with proof objects 
inside - 
E 

with a forgetful function | —| and an interpretation function [—|), 

|-|:Bok 

[-],: EOF 
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The key property to be proved is then 


lé| lo lel 


_ But note that E depends on F and p (it should ‘know’ about semantics), so 
E is a type of ‘heavy’ syntactic expressions (including proof terms). This can 
only work if we let EF be a dependent type over F: 


Ey 
which in Coq terms is defined as: 


Inductive xE : F -> Set := 

xevar : (i:V)(xE (rho i)) 

xeone : (xE fone) 

xezero : (xE fzero) 

xemult : (f,f'?:F)(e:(xE f)) (e’:(xE f’))(xE (fmult f f’)) 

xediv : (f£,f’:F)(e:(xE f))(e’:(xE f’)) (nz:"(f’ = fzero)) 
(xE (fdiv f f’ nz)). 


The type Ey represents the type of ‘heavy’ syntactic expressions whose inter- 
pretation is f. The interpretation function is now 


[-lo: Es 9 F 


for which it should hold that 
[lo =s6. f 
so [—], is constant on its domain. In Coq terms we define: 


xI := [f:F][e:(xE f)]f : (£:F)(xE f) -> F. 


Note that we do not define the interpretation by induction on e : (xEf), but we 
just return f (the intended interpretation). The obligation is now to prove that 
the underlying ‘light’ syntactic expression has indeed f as interpretation. The 
forgetful function, extracting the ‘light’ syntactic expression, now is 


| = | : Ey SE 
Jt maps the ‘heavy’ syntactic expressions to the ‘light’ ones. In Coq terms: 


Fixpoint xX (f:F; e:(xE f)] : E := 
Cases e of 


(xevar i) => (evar i) 
xeone => eone 


(xemult f’ f’’ e’ e’’) => (Cemult (xX f’ e’) (xX f’’ e’’)) 
(xediv f’ f’? e’ e’’ p) => (ediv (xX f’ e’) (xX f’’ e’’)) 
end. 


| 
| xezero => ezero 
| 
| 
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which is defined by induction over (xE £). The maps [—], and |—| ‘extract’ the 
two components (syntactic expression and semantic element) from the ‘heavy’ 
encoding. The key result now is that the second extraction is an interpretation 
of the first: 


extractcorrect: Vz € Es(|z| Ip [z]p) 


which is just Vz € E's(|z| Ip f). 
The tactic now works as follows, given a problem f =p f’. 


— find (by tactic) é € Es, &’ € Ey and p with 
lel Ip f and |e"| |, f’ 


— obtain (from extractcorrect) proof terms for these two statements 
— check (by type checker) whether 


N (lel) =26. N(lé'1) 


So, the tactic creates e,e’ of type E indirectly by creating é@,é’ of types 
E;, Ey. In a diagram the situation is now as follows. 


pala op Np eels. 
Nel 4 NG 
Ee  —_ F 
=F 


The outside triangles commute due to eztractcorrect; the large middle triangle 
commutes due to extensionality, the other two triangles commute due to norm- 
correct. If we make the proof term given by this method explicit, it is 


(extensionality rho ne f f’ 
(mormcorrect rho e f (extractcorrect rho f xe )) 
(normcorrect rho e’ f’ (extractcorrect rho f’ xe’))) 
: f = f? 


where xe and xe’ correspond to é and é’, and where we have defined 


e := (xX f xe). 
e’? := (xX f’ xe’). 
ne := (Ne). 


This term is only well-typed when (N e) is 86s-convertible with (N e’). 
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Normalizing Proof Loaded Objects 


In presence of the type Ey, we could do without the type £ all-together. Then 
we would define a normalization function NV to operate on the ‘heavy’ syntactic 
expressions of type E ';. This is possible (and it yields a simpler diagram), but it 
is not desirable, because then the computation (reducing V(é) to normal form) 
becomes much heavier. Moreover, it would be more difficult to program NV (ha- 
ving to take all the proof terms into account) and the two levels in the reflection 
approach would be less visible, therefore slightly blurring the exposition. 

Nevertheless, for reasons of completeness we have also constructed (see [4]) 
the function NV together with proofs that it is correct. Ideally, this would amount 
to the following diagram 


ee ce er 


However, VV can not have the dependent type Ey + Ey (for f : F), because the 
value (in £’) of the output of the normalization function is not literally the same 
as its input value, but only provably equal to it. So, we can not construct N as a 
term xN : (2:F) (xE z) ~> (xE z). Instead we construct xN : fE -> fE, 
where fE is the type of pairs < f, e >, withf : Fande : (xE f). (In type 
theoretic terms, this is the Y’-type of dependent pairs (f,e) with f : F and 
e: E's.) Then we have to prove that if N((f,e)) yields (f’,e’), then f and f’ are 
(provably) equal in F. 

If we cast this in purely mathematical terms, the situation is as follows. Define 
E := Df:F.Ey; and let wf be the predicate on syntactic expressions stating that 
it has an interpretation (it is well-formed). It is defined as follows (for e : E). 


wf (e) = Sf:F(e |p f)- 


Now there are maps lift : {e: E| wf(e)} > E and |—|: E - {e: E | wf (e)}. 
Furthermore, we can construct a proof-object 


normuf : Ve:E(wf(e) > wf (N(e)))}. 


Then we can read off the normalization function N : E + E from the following 
diagram. 


{e: E | wf (e)} —— {e: E | wf (e)} 
N 
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The proof term normuf shows that N is indeed a function from the set of 
well-formed expressions to itself. The correctness of NV is given by 


normeorrect : Ve:E([é] =r [V(e)]). 


Here [-] : E > F is the interpretation function mapping (heavy) syntactic 
expressions to elements of F’. (As a matter of fact, it is just the first projection.) 


5 Partial Reflection in Practice 


The approach of partial reflection is successfully used in our current FTA project 
(Fundamental Theorem of Algebra). First of all, we have a tactic called Rational 
for proving equalities. This tactic is implemented as outlined above. 

But often we do not just want to prove an equality, but rather to use an 
equality to rewrite a goal in a different form. In order to explain how we have 
implemented rewrite tactics, we first say something about the equality in the 
FTA project. Our equality is just a congruence relation, respected by operations 
(such as + and *) and certain predicates (such as <). This means we cannot just 
replace equals by equals in any expression, but only those built-up from terms 
respecting our equality. (This stands in contrast to the standard Leibniz-equality 
in Coq; Leibniz-equals may be replaced in any proposition.) For instance, we have 
the following lemma: 


less_wd_left : (a,b,c:F)(a=b) -> (b<c) -> (a<c). 


Hence, we have defined rewriting tactics for each important predicate that 
respects our equality. For instance, the tactic Step_less_left t applies to a 
goal p<q: it lets Rational solve the equation t=p and returns the new goal t<q. 
It is defined for each t as 


(Apply less_wd_left with b:=t) ; 
[ Rational | (* Use Rational tactic to prove equality *) 
Idtac ] (* Do nothing with new inequality *) 


The following example illustrates its use. (Note that 1/z//H2 denotes 1 di- 
vided by z with as proof of the side condition z#0 ~ z #p 0 — the variable 
H2.) 


x <y 

< Step_less_left x*z*(1/z//H2) 
H1 :0<z 
H2 :2z#0 


H3 : x*zZ < y*z 


x*z*(1/z//H2) < y 
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6 Conclusion 


We have extended the reflection method to include partial functions. The power 
of the method lies in the fact that no new proof obligations arise. So, if the user 
wants to prove a simple equation involving partial functions, the system does not 
(have to) generate a new set of goals (in order to prove that all partiality side 
conditions are fulfilled). That the necessary side conditions are fulfilled is already 
proven by the correctness of the normalization function. Phrased differently: 
normalization preserves well-definedness. The other crucial point is the fact that, 
although some syntactic expressions may be undefined, the ones that our tactic 
generates never are, for the simple reason that they are encodings of well-defined 
semantic objects in the theorem prover. So, the normalization function starts off 
from a syntactic expression that is well-defined (for the simple reason that the 
semantic object is its interpretation) and the well-definedness is preserved under 
normalization. 

As a side remark, we point out that the fact that the encoding always yields 
a well-defined syntactic expression is a statement on the meta-level. As the en- 
coding function is a meta-function we can not expect to state this literally in the 
theorem prover. We can state Vf : Fae: Eap(e]l, f), but this does not capture 
what we want to say: it is trivially true, taking a variable v for e and p(v) = f, 
and it does not say anything about the encoding function. 

The actual implementation of the method as a tactic for solving equations 
between field elements has shown that this is a very useful technique. We believe 
it is very generally applicable in situations where partiality occurs. 
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Abstract. Two methods of programming BDD-based symbolic algo- 
rithms in the Hol98 proof assistant are presented. The goal is to pro- 
vide a platform for implementing intimate combinations of deduction 
and algorithmic verification, like model checking. The first programming 
method uses a small kernel of ML functions to convert between BDDs, 
terms and theorems. It is easy to use and is suitable for rapid prototying 
experiments. The second method requires lower-level programming but 
can support more efficient calculations. It is based on an LCF-like use 
of an abstract type to encapsulate rules for manipulating judgements 
pt > b meaning “logical term t is represented by BDD 6 with respect 
to variable order p”. The two methods are illustrated by showing how 
to perform the standard fixed-point calculation of the BDD of the set of 
reachable states of a finite state machine. 


1 Background and Motivation 


Theorem proving and model checking are complementary. Theorem proving can 
be applied to expressive formalisms (such as set theory and higher order logic) 
that are capable of modelling complex systems like complete processors. Ho- 
wever, theorem proving systems require skilled manual guidance to verify most 
properties of practical interest. Model checking is automatic, but can only be 
applied to relatively small problems (e.g. fragments of processors). It can also 
provide counter-examples of great use in debugging. 

The ideal would be to be able to automatically verify properties of complete 
systems (and find counter-examples when the verification of properties fail). This 
is not likely to be practical in the foreseeable future, so various compromises are 
being explored, for example 


(i) adding a layer of theorem proving on top of existing model checkers, to 
enable large problems to be deductively decomposed into smaller pieces that 
can be checked automatically [12,2]; 

(ii) adding checking algorithms to theorem provers so that subgoals can be ve- 
rified automatically [15] and counter-examples found. 


J. Harrison and M. Aagaard (Eds.): TPHOLs 2000, LNCS 1869, pp. 179-196, 2000. 
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These two approaches differ in the starting point: (i) starts from a model 
checker and (ii) starts from a theorem prover. The goal is the same: combine the 
best of model checking and theorem proving. This paper concerns approach (ii). 

The Prosper project’ is currently undertaking research into making HOL98 
the basis of a tool integration platform. Part of this work has resulted in the 
definition and implementation of a mechanism for enabling external tools to be 
‘plugged-in’ to HOL98. This supports the easy implementation of the kind of 
linking of theorem proving and model checking done in pioneering studies with 
PVS [15] and falls under (ii) above. 

This paper describes some experiments in adding simple model checking in- 
frastructure to the HOL98 theorem prover and so also falls under (ii). However, 
it differs from the PVS and Prosper approaches because it aims to provide secure 
and general programming infrastructure to allow users to implement their own 
bespoke BDD-based verification algorithms and then to tightly integrate them 
with existing HOL98 tools like the simplifier. Sometimes it is appropriate to use 
an existing off-the-shelf tool and sometimes it is appropriate to build one’s own 
bespoke solution. This paper concerns the latter. 

The HOL98 system is based on Milner’s LCF proof assistant [5]. In such 
systems arbitrary terms are values of a type term and can be freely constructed. 
However, theorems are represented as values of an abstract type thm whose 
primitive operations are axioms and rules of inference. Being an abstract type, 
values of type thm (i.e. theorems) can only by constructed using combinations 
of the primitive operations provided for the type thm — i.e. by proof. Theorem 
proving tools such as decision procedures, proof search strategies and simplifiers 
are implemented by composing together the primitive operations (i.e. axioms 
and inference rules) using programs in the ML programming language. 

The goal of the research described in this paper is to extend the classical 
LCF-approach so that efficient symbolic calculations, like model checking, can 
also be implemented. This involves generalising the notion of theorem to include 
certain data representation judgements. 


2 Overview 


Let M be a finite state machine whose state is a vector (v1,...,Un) of boolean 
variables v1,...,Un. Let P be some property of interest of states of M and let S 
be defined so that S 7 (v1,...,U,) is true if (v1,...,U,) is a state reachable in i 
or fewer steps from an initial state of M. 

Any boolean term with boolean free variables can be represented by a binary 
decision diagram (BDD) [3]. An example of such a term is: 


Vi. Si (V1,---,Un) => P(v1,-.-,Un) 
This is true if all reachable states of M satisfy P. Note that although this 


term and all its free variables are boolean, it contains a bound variable i ran- 
ging over natural numbers, so the standard BDD algorithms cannot construct 


| nttp://www.dces.gla.ac.uk/prosper 
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its BDD. However, symbolic model checking algorithms provide ways of com- 
puting the BDDs of such terms. These algorithms can also compute the BDDs 
corresponding to much more complex properties of the execution of finite state 
machines (e.g. properties expressed in temporal logic). 

In order to provide a platform for experimenting with programming model 
checking algorithms in the HOL98 proof assistant, and for combining them with 
deductive theorem proving, the BuDDy package? has been interfaced® to Moscow 
ML‘ so that BDDs can be manipulated as ML values of a type bdd and the 
various BuDDy operations are linked to ML functions. 

In this paper, the evolution of an initial simple style of programming with 
BDDs into a more efficient one is described. The computation of the set of reach- 
able states is used as a running example (more detailed examples are described 
elsewhere [6,7]). Note that 


Vi. Sa (Un,.--5Un) > P(v1,...,Un) 
is equivalent to 

(Hi. Si (u,.. -;Un)) => P(v1,...,Un) 
The set of reachable states is represented by the term Ji. S 7 (v1,..., Un). 

The first method is described in Section 3 and is based on a validity-critical 
kernel of three ML functions: 

termToBdd : term — bdd 


addEquation : thm — term x bdd 
bddOracle : term — thm 


As long as these are implemented correctly (which includes BuDDy being 
correct) then the system is sound. The use of these functions to compute the 
BDD of Fi. Si (u1,..., Un) is described in Section 3.1. 

The second method uses an abstract type termbdd that represents ‘judge- 
ments’ pt +» bthat mean “HOL98 term ¢ is represented by BDD 6 with respect 
to variable order p”. An LCF-like approach to ‘proving’ such judgements is im- 
plemented: the type termbdd implements judgements just like the type thm 
implements theorems. A fragment of the calculus of BDD representation judge- 
ments is presented in Section 4.1 (with some planned extensions in Section 4.3). 
The ML functions implementing rules of inference for judgements are given in 
boxes following the rules. The calculation of 4i. Sz (v1,..., un) using judgements 
is described in Section 4.2. 

Some related work is discussed in Section 5. 


3 Representing Terms as BDDs 


The interface from BuDDy to Moscow ML provides an ML type bdd together 
with ML functions corresponding to the C functions in the BDD package. Using 


* http://www. itu.dk/research/buddy/ 
3 http://www. itu.dk/research/muddy/ 
“http: //www.dina.kvl.dk/~sestoft/mosml.html 
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these, it is easy to implement a function that maps any quantified boolean for- 
mula® to a BDD. Such a function has ML type term—bdd and is defined by a 
simple recursion over the structure of terms.® 

Two approaches to supporting BDD calculation in HOL98 are described here. 
The first one is described in this section and is based on a global table that stores 
pairs (t, b), where t is a term and 6 the BDD representing it. The second approach 
is described in Section 4. 

The HOL98 library HolBddLib uses the Moscow ML interface to ML to 
implement BDD tools for HOL98 [7]. This library predefines the following ML 
function for adding entries to the BDD table 


addEquation : thm — term x bdd 


Evaluating addEquation(b t; = t2) computes a BDD, bz say, of the term tz and 
then stores the association (t;,b2) in the BDD table. The pair (¢1,b2) is also 
returned.” 

Using the BDD table, the BDDs of boolean terms that contain defined con- 
stants can be computed. For example, suppose the constant Foo is defined by 


+ Foo(z, y) = (large QBF involving x and y) 


then the BDDs of terms such as 3z. Foo(z,(y V z)) A Foo(z,(z = y)) can be 
computed in two ways: (i) by expanding out the definition of Foo to get a QBF; 
and (ii) by using the precomputed BDD of Foo(z, y) stored in the table. 

The choice between (i) and (ii) depends on whether it is more efficient to sepa- 
rately recompute the BDDs of Foo(z, (y V z)) and Foo(z, (x = y)) from scratch 
or to get the BDDs by applying BuDDy operations to the BDD of Foo(z, y). An 
ML function termToBdd of type term—bdd is provided by HolBddLib that uses 
the second method (ii) to convert a term to a BDD. This works by deductively 
transforming terms to a form in which subterms correspond to entries in the BDD 
table. For example, applying termToBdd to 4z. Foo(z, (y V z)) A Foo(z, (z > y)) 
first uses a HOL98 conversion to prove the theorem 


+ (Az. Foo(z,(y V z)) A Foo(z,(x => y)) 


(Az. (Sy. (vi = y Vz) AFoo(z,y1)) A Sy. (yr = cy) A Foo(z,y1)) 
The BDD of the left hand side of this equation can then be computed by compu- 
ting the BDD of the right hand side, in which all applications of Foo have been 
transformed to be applied to sequences of distinct variables. The BDD of such 


> A quantified boolean formula (QBF) is a term build out of boolean variables and 
constants (T, F) using boolean operators (—, A, V, >, = etc.) and quantification 
over boolean variables. 

® The recursion needs to take into account BDD operations that optimise cer- 
tain combinations of boolean constructions. For example, the BDD of a term 
Qui -++Un. ti op t2, where Q is a quantifier and op a binary operator, can be ef 
ficiently constructed from the BDDs of ¢; and t2 in a single step. First constructing 
the BDD of ti op tz and then doing n quantifications would be inefficient. 

7 An exception is raised by addEquation if it is not applied to an equation or if the 
computation of the BDD of f2 fails. 
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applications can be obtained by a simple replacement operation on the BDD 
of Foo(z,y) and so the BDD of the large term on the right hand side of the 
definition of Foo does not need to be recomputed, just tweaked. 

The BDD representing a term is determined by an ordering of the variables. 
The variable order used by termToBdd can be explicitly declared, but if no order 
is declared, then variables get the order in which they are first encountered. 

The creation of HOL98 theorems via BDD calculation is provided by a single 
ML function bddOracle : term — thm which maps a term ¢t to the theorem | t 
if termToBdd(t) is the BDD TRUE and raises an exception otherwise. 


3.1 Computing Reachable States Using First Method 


Suppose constants B and R are defined to represent, respectively, a set of initial 
states of a machine and its transition relation: 
Bdef = | B(vj,...,Un) = -°- 
Ref = | R((u1,...,0n),(¥},---,Un)) = oct 
The predicate S i repesenting the set of states reachable in i or fewer steps 
is then defined recursively by 
Sdef = '(SOv = Bv) 
A 
Vi.S (41) v = (Siu V (au.Siu A R(u,v))) 
where the variables u and v range over n-tuples of booleans and so can be spe- 


cialised to tuples of boolean variables (ui,...,Un) and (v1,...,Un), respectively, 
and then separate base and step cases derived as theorems S.0 and S_suc, where 
S.0 = FS 0 (v1,..-,Un) = B (v4,.--,Un) 
S-suc = + Vi. S$ (+1) (v1,...,un) = 
Si (v1,-.-,Un) 
V 


Fuy +++ Un. Si (ui,.--,Un) A R((ui,..-, tn), (V1,---,Un)) 
To compute the BDD of the set of reachable states, first add the definitions 
of B and R to the BDD table: 
addEquation B_def; 
addEquation R_def; 


next compute the BDDs of S 0 (v,...,Un), S 1 (v1,...,Un), S 2 (v1,..-, Un) 
etc. Note from S_suc that the compution of of the BDD of S (+1) (v1,..., vn) 
needs the BDD of S i (v,...,Un), thus the order in which BDDs are added 
to the table is important. The V-quantified variable 7 in S_suc can be successi- 
vely specialised to 0, 1, 2 etc. with the HOL98 inference rule SPEC (evaluating 
SPEC t; (+ Vi. to(i)) specialises 2 to t; to deduce F to(t;)). To then reduce the 
numeral i-+1, the function SimpNum can be used (SimpNum reduces arithmetical 
combinations of numerals, e.g. simplifies occurrences of “2+1“ to *3%). 


addEquation S_0; 

addEquation (SimpNum(SPEC “0*% S_suc)); 
addEquation (SimpNum(SPEC “1* S_suc)); 
addEquation (SimpNum(SPEC *2“ S_suc)); 
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After these six applications of addEquation, the BDD table will consist of 


(B(w1,..-;Un), 61) 

(R((v1, «+5 Un); (Vis-+-5Un))s b2) 
(S 0 (v1,.. Un); bs) 
(S 1 ae ee b4) 
( ) ie 


nN 
tN 
—_, 
e 
ay 
4 
3 
a 


where, in aes b, = b3. Note that an evaluation of 

addEquation (SimpNum(SPEC “i* S_suc)); 
uses termToBdd to compute the BDD of the right hand side of S_suc with i 
specialised to i. Since the BDDs of S i (v1,..., un) and R((v1,..., Un), (Uj, Up)) 
are already in the BDD table their BDDs can be reused. 

Since the state space is finite and the sets of states represented by S 7 increases 
as 7 increases, it follows that for some particular 7, say i = i, that eventually 


Si (v,.,Un) = S (i+1) (u1,..., Un) 
which can be tested for at each stage by evaluating: 

bddOracle “Si (v1,...,Un) = S (i+1) (u1,...,un)* 

This will either raise an exception (fixed-point not yet reached) or return a 
theorem | Si (v1,...,Un) = S (i+1) (v1,..., Un). When this theorem is proved, 
the BDD of the set of reachable states is clearly the BDD of S i (v1, ..., un) (see 
the theorem FpTh below). 

The fixed-point is easily computed by an ML function that makes use of the 
auxiliary functions described in the following table. 


(ML function | ML type [_ Explanation | 
intToTermn int>term intToTerm n = “n“ - 
concl thm—term concl(+ t)= “t* 
lhs term— term lhs “t)=to% = “t,*% 
rhs | term—term rhs “ty=to% = “to% 


LeftDisjunct | term—term LeftDisjunct “t,; Vtg“ = “t,* 
mk_eq termxterm—term | mk_eq(“t;% ,“tg“) = “t,;=te“ 


In the function definition below, ML comments are enclosed between (* and *). 


fun iterateToFixedPoint S_suc i = 
let val i_tm = intToTerm i 


val Sth = SimpNum(SPEC i_tm S_suc) 
val Si = LeftDisjunct(rhs(concl Sth)) (* Si (...) *) 
val S2. = lhs(concl Sth) (* S (itt) (...) *) 


in 
addEquation Sth; (* adds BDD of S (i+1) (...) to BDD table *) 
(bddOracle(mk_eq(S1,S2)), i_tm) 
handle oracleError => iterateToFixedPoint S_suc (i+1) 
end 
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The function iterateToFixedPoint just iterates S_suc until a fixed-point is 
reached and then, if the fixed-point is reached after i iterations, returns a pair 
whose second component is “i* and whose first component is the theorem: 

FSi (V1, ..5Un) = S (i41) (04,..., Un) 
The following fixed-point theorem is straightforward to prove 
FpTh = FE Vi. (Vuz-++Un. St (u1,...,Un) = S (i+1) (v1,...,0n)) 
=> 
(Voz +++ Un. Si (v4,---5Un) = Bt. St (v4,...,Un)) 
The function ComputeReachableStates defined below computes a pair, retur- 
ned by addEquation, whose first component is the term 37. S 7 (v1,...,un) and 
whose second component is the BDD of this term. The definition of the function 
uses Modus Ponens, which is represented by the ML function MP (evaluating 
MP (+ t1;=>t2) (F t1) returns the theorem | t2). The definition also uses GEN_ALL, 
which proves the universal closure of a theorem (i.e. universally quantifies all 
free variables), and SYM, which reverses an equation (evaluating SYM(F t) = t2) 
returns the theorem | tg = 1}). 


fun ComputeReachableStates B_def R_def S_0 S_suc = 
(addEquation B_def; 
addEquation R_def; 
addEquation S_0; 
let val (th,i_tm) = iterateToFixedPoint S_suc 0 
in 
addEquation(SYM(MP (SPEC i_tm FpTh) (GEN_ALL th))) 
end) 


The theorems S_0 and S_suc are just consequences of the definition of S, so 
could be computed from B_def and R_def. Thus ComputeReachableStates only 
needs to take B_def and R_def as parameters [6]. 

Note that executing ComputeReachableStates B_def R_def S_0O S_sucin- 

/ 


volves computing the BDD of R((v1,..-, Un), (v{,---,U%,)), which may be large. 

Instead, it may be possible to derive an equation, S_simp-suc say, that ex- 
presses S (i+1) (v1,...,0n) in terms of S i (v1,...,Un) in a way that avoids 
having to compute the BDD of the transition relation. This can be achieved, for 
example, by disjunctive partitioning® [10, page 79]. In this case, the calculation 


of reachable states can be done just with 


fun ComputeReachableStates B_def R_def S_0 S_simp_suc = 
(addEquation B_def; 
addEquation S_0; 
let val (th,i_tm) = iterateToFixedPoint S_simp_suc 0 
in 
addEquation(SYM(MP (SPEC i_tm FpTh) (GEN_ALL th))) 
end) 


8 Disjunctive partitioning is called ‘early quantification’ by some authors (13, page 45] 
and also called ‘miniscoping’ in the context of theorem proving. 
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Note that in this definition of ComputeReachableStates the argument R_def 
is not used. 

Disjunctive partitioning can be done automatically using the HOL98 sim- 
plifier. To illustrate this nice example of synergy between HOL98 and BuDDy 
consider the transition relation R that corresponds to the interleaving of three 
assignments vj = Ey (vj, v2, U3), vg = Ee(v1, v2, v3) and vg = £3(v1, v2, v3) 

R((v1, v2, v3), (vj, U9, v3)) = 
(vy = Ey (v1, v2,03) A vg=v2 A vg = 03) V 
(vj =v. A vy = Eo(v1,v2,03) A vg = v3) V 
(vp =u A vg =v2 A vs = E3(v4, v2, v3)) 
If @, U abbreviate (ui, u2, us), (v1, V2, v3), repectively, then S (+1) ¥ is given by 
S (#41) 0 = Sid Vv (Au.Sit A R@,D)) 
Disjunctive partitioning is the following simplification of the right disjunct: 
3u.Si% A R(G,d) 
= Wu.Sit a (vy = Ey A Ug =u2 A U3 = ug) V 
vy = uy A vq = Eott A U3 = U3) V 
Vy = U1 A V2 = U2 A U3 = E3%)) 


in jo 


(4t. Sit A v= Ey A vg = U2 A U3 = ug) V 
(St. Sit A vy =u A vo = Bott N v3 = ug) V 
(3u.St@ A vp =u A vo =U A v3 = Est) 
( 


(dui. Si (t1, V2, U3) Av, = (uy, v2, U3)) A 
(dua. Ug=u2) A 
(dus3. v3=us3)) Vv 
((Suy. v1=U1) A 
(Aue. Si (v1, U2, U3) A vg=Eo(v1, U2, U3)) A 
(dus. v3=us)) V 
((Su4. v4=1) A 
(due. Ug=U2) A 

(dus. Si (v1, V2; u3) A v3=E3(v1, V2, u3))) 

= (du. Si (ui, v2, 03) A vy = E1(u1, v2, 3)) V 

(Jus. Si (v1, U2, U3) A v2 = E2(v1, ue, v3)) V 

(dug. Si (v1, V2, U3) A v3 = E3(v1, v2, u3)) 
Thus the BDD of 3u.$ 7 @ A R(&,d) can be computed without ever computing 
the BDD of R(u,%) and also without performing the BDD operation correspon- 
ding to tu (which might be expensive if there are lots of state variables). It 
follows that the BDD of S (t+1) 0 can be computed from the BDD of Si@ 
without computing the BDD of R(%,7%), just by combining small BDDs via boo- 
lean operations and single quantifications. 

The usual implementation of disjunctive partitioning is by writing programs 
that directly construct the BDD of the simplified term. The logical transformati- 
ons are thus encoded in BDD building code. The approach here is to deductively 
simplify the next-state relation. The advantage is that the simplification is easy 
to program (given good simplification tools) and is guaranteed to be sound. Si- 
milar deductive simplifications come up in computing the backward image of a 
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transition relation when finding the shortest sequence of states to a counterex- 
ample. However, it remains to be seen if deductive simplification using HOL98 
can lead to new techniques, rather than nice ways of implementing existing ones! 


3.2 Discussion of First Method 


The style of programming illustrated by the definition of iterateToFixedPoint 
has good and bad points. It is good in that it is easy to experiment with BDD 
calculations via HOL98 terms: for example, disjunctive partitioning is easy to 
implement using the HOL98 simplifier. However, as the BDD table accumula- 
tes entries, the runtime of termToBdd (and hence addEquation) slows down. 
Furthermore, on large terms the process of transforming subterms to enable 
previously computed BDDs to be reused gets slow. To try to alleviate such 
performance problems, the data structure representing the BDD table is quite 
complex, combining a hash table with a discrimination net (to find matches to 
terms that are not in the map, but are instances of terms that are [4]). The 
resulting structure and the associated code supporting it is complex and hard 
to maintain. Furthermore, termToBdd makes some fixed choices about how to 
invoke BuDDy operations that might not be sensible for some situations. For 
example, for historical reasons’, a term Foo(t,t2), where t; and tz are boolean 
terms, will be transformed to dv, ve. (vi = ti) A (ve = te) A Foo(vj, v2), and 
then the BDD of this is computed using BuDDy’s operations for conjunction 
and quantification. This works, but it might well be more efficient to separately 
compute the BDDs of t, and tz and then to use BDD substitution rather than 
quantification. There is no problem upgrading termToBdd to use substitution, 
but as more and more changes are made to optimise the code, the result becomes 
a heuristic expert system whose performance is hard to predict or control. 

Another problem concerns storage management. The Moscow ML and BuDDy 
garbage collectors are linked, but the BDD map will keep BDDs around and 
block their collection, even if they are no longer used. This can be managed 
by having a deleteBdd operation that removes entries from the BDD table, 
and then inserting calls to the function into programs. For example, modifying 
iterateToFixedPoint to 


fun iterateToFixedPoint S_suc i = 
let val i_tm = intToTerm i 


val Sth = SimpNum(SPEC i_tm S_suc) 
val Si = LeftDisjunct(rhs(concl Sth)) 
val $2. = lhs(concl Sth) 
in 
addEquation Sth; (* Si now used *) 
(bddOracle(mk_eq(S1,S2)), i_tm) 
handle oracleError => (deleteBdd Si; (* delete Si entry *) 


° The first version of the BuDDy interface to Moscow ML did not support the opera- 
tion that substitutes a BDD for a variable in another BDD. 
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iterateToFixedPoint S_suc (i+1)) 
end 


however, such explicit BDD management is hard to get right and it is easy to 
leave storage leaks. 

In the next section a second syle of programming is described that enables 
tightly tuned algorithms to be implemented whose performance is predictable. 


4 BDD Representation Judgements 


In the LCF approach, theorems are represented by an abstract type whose pri- 
mitive operations are the axioms and inference rules of a logic. Theorem proving 
tools are implemented by composing together the inference rules using ML pro- 
grams. 

This idea can be generalised to computing valid judgements that represent 
other kinds of information. In particular, consider judgements (p,t, 5), where p 
represents a variable order, ¢ is a boolean term all of whose free variables are 
boolean and b is a BDD. Such a judgement is valid if 6 is the BDD representing 
t with respect to p, and we will write pt ++ 6 when this is the case. 

The derivation of ‘theorems’ like pt ++ 6 can be viewed as ‘proof’ in the 
style of LCF by defining an abstract type termbdd whose primitive operations 
correspond to the BDD functions provided by BuDDy. The type termbdd models 
judgements pt +> 6 analogously to the way the type thm models theorems | t. 


4.1 Rules for BDD Representations 


BDD variables in BuDDy are represented by natural numbers and the ordering 
used is the standard one. A variable ordering p can thus be represented by a 
partial function, called a variable map, from logic variables (a subset of terms in 
HOL98) to numbers. An ML function Var of ML type int-+bdd maps a number 
to the corresponding BDD variable.!° Thus an inference rule for inferring valid 
triples p v +» 6, where v is a variable, is 


pur> Varn 


The name of the ML function corresponding to this rule is BddVar, as indica- 
ted to the left of the rule. In what follows, the descriptions of the ML functions 
implementing the rules are given in easy-to-skip boxes. For example: 


BddVar : term—termbdd 
BddVar(v) returns pg t +» b, where if v already has an associated BDD in the 
global map pc, then b = Var(pqc(v)) and if v is not in the map, then pg is extended 
so that pg(v) = n, where n is the first unused BDD variable, and then 6 = Var(n). 
In this case BddVar(v) has a side-effect on pg. 


10 The names and description of functions have been simplified to improve the expo- 
sition. For example, the function called Var in this paper is actually called ithvar 
in HolBddLib. Furthermore, some functions described here have not yet been imple- 
mented at the time of writing. An example illustrating the programs described in this 
paper can be seen at http: //www.cl.cam.ac.uk/“mjcg/BDD/TPHOLs2000Paper .m1. 
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The rules given in this section have the same variable map p in the hypotheses 
and conclusion. The current implementation assumes there is a single global 
variable map, pq, held in an assignable (reference) variable, and the construction 
of BDDs is done with respect to this (see Section 4.3 for further discussion). The 
user may explicitly set up this map at the beginning of a session, or let it be 
built incrementally as needed. We will write t +» b to mean that t is represented 
by 6 with respect to the current global variable map, i.e. pg t 1» 0b. Each rule is 
implemented by an ML function that takes values of type termbdd representing 
any hypothesis judgements and also other parameters (e.g. a term v representing 
a variable, as in BddVar) and returns a value representing the conclusion. For 
example, BddNot below takes an ML value corresponding to a judgement that 
gives a BDD representation for a term t and returns a value corresponding to a 
judgement representing —t. 


The HOL98 logical constants T and F (values of type term) denote truth 
and falsity, respectively, whereas the values TRUE and FALSE of ML type bdd are 
the corresponding BDDs. The function NOT: bdd—bdd creates the negation of a 
BDD. 


BddT : termbdd, BddF : termbdd 
BddT and BddF are predefined to be T t+» TRUE and F +> FALSE, respectively. 


BddNot : termbdd—termbdd 


BddNot(t +> 6) returns —t +> NOT 6. 


The rules for propositional connectives are straightforward. The BuDDy 
binary operators AND, OR, IMP, BIIMP construct the conjunction, disjunction, 
implication and equivalence of BDDs. 


pt by ptg ++ be Beane Cie ptg +> be 


BddAnd 
ONG ts Nig ey By AND pt, Vto +> 6; OR by 


pty b) pte > be 
pty>te WH by, IMP bo 


pty +> by pty ++ bo 


BddEq 


BddAnd : termbdd x termbdd—termbdd 
BddAnd(ti a bi, ta cea be) returns t; A t2 +> 6; AND bo; 
BadOr(t; +> b1,t2 + b2) returns ty V tz ++ 61 OR bo; 
BddImp(t: +> 61,t2 +> b2) returns ti=>t2 ++ bi IMP bo; 
BddEq(ti rH b1,t2 b2) returns t; = te +» b; BIIMP bo. 
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The functions Forall and Exists of type (int list) bdd-—>+bdd quantify 
BDDs, thus 


pt+> b  pwuj)=m -++ (up) = M% 

ddF =a Vin FA CP OEELLTAL Cen. 
BddForall pu, +++ Up. t + Forall[n,...,nplb 

Badheveye ee = Se) Sh 
p avy: Up. t > Exists[n,... »Nplb 


BddForall, BddExists both of type term list-+-termbdd—termbdd 
BddForall [v1,...,Un] (t +> 6) returns 
Vu1+++Un. € + Forall [pg(vi),-..,pa(un)] b; 
BddExists [vi,...,Un] (t + 6) returns 
Jv, +++ Un. t H Exists [pg(v1),-..,pG(Un)] 6. 
If any of the variables v1, ... , Un are not in the global variable map, then the 
map is extended. Thus BddForall and BddExists might side-effect pg. 


The BDDs of quantifications of conjunctions can be built by calling AND 
followed by Foral1 or Exists, but it is more efficient to use optimised algorithms 
ForallAnd and ExistsAnd provided by BuDDy. 


pti > bi pte > be plvi)=m ++: plp) =p 


adF liAnd 

Begkorai ihe pv, +++ Up. ti Ate + ForallAnd [m,...,p] 1 bg 
ptr > bh pte be plui)=m -:: plvp) = 

ddE: tsAnd —= ————_— 

E Saercr te p dv "t+ Up. ty Atg » ExistsAnd (nn, ee Np] by bo 


BddForallAnd, BddExistsAnd of type term list—termbdd—termbdd—termbdd 
BddForallAnd [v1,...,Un] (ti > 61) (te + be) returns 
Vu1-++Un. t ++ Foralland [pe(ui),...,eG¢(vn)] b1 62; 
BddExistsAnd [v1,...,Un] (t1 +» b1) (te + be) returns 


dvi --+Un. t ++ ExistsAnd [pa(vi),...,pG(vn)] br be. 
If any of the variables v1, ... , Un are not in the global variable map, then the 
map is extended. Thus BddForallAnd and BddExistsAnd might side-effect pc. 


The next rule expresses the fact that logically equivalent terms have the same 


BDD. 


kt, = 
BddEqMp ty = te pt, » b 


pt, » b 


BddEqMp : thm—termbdd-— termbdd 
BddEqMp (f ti; = ta) (t1 + b) returns tz 4 b. 


Let t{v1<-v},..., Ups-u,} denote the result of replacing distinct free variables 
U1, +++) Up in a term t with distinct variables vj, ... , Ups respectively, renaming 
any bound variables in t to avoid capture. Let b{ni<-n},...,np+-n,,} denote the 


result of replacing distinct BDD variables nj, ... , np in a BDD 6 with distinct 
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variables nj, ... , ,, respectively. Let Domain(p) denote the set of variables p is 
defined on (i.e. Domain(p) = {v| dn. p(v) =n}). 
ptr b {U1,...,Up,V4,---,U,} C Domain(p) 


BddReplace 
Pec’ pHVIK U,V} > Of (v1) — (v4), .-- (Un) —P(H)) 


BddReplace : (termxterm)list->termbdd—termbdd 
BddReplace [(v1,v}),...,(Un,Un)] (t + 6) returns 


t{uiev}, bee Un tUn } te b{pa(v1)+-pa(v}), see »PG(Un)pa(un)}- 
If any of the variables v; (1 < i < 7) are not in the global variable map, then they 
are added if necessary. Thus pg may be side-effected. 


The function bddOracle described earlier converts a term t to a BDD using 
termToBdd, which might be slow, and then returns | ¢ if the resulting BDD is 
TRUE. The rule TermBddOracle below just checks whether the BDD part of a 
judgement is TRUE and if so creates a theorem whose conclusion is the term part. 
It is thus very efficient. 


pt ++ TRUE 


TermBddOracle cy 


TermBddOracle : termbdd—>thm 
TermBddOracle(t +» b) returns the theorem | ¢ if b is TRUE, otherwise an excep- 


tion is raised. 


The ML functions bddOracle and TermBddOracle are the only ways of 
creating theorems from BDDs using HolBddLib. Eventually it is expected that 
bddOracle will be defined in terms of TermBddOracle. 

Finally, the function termToTermBdd provide a way of using the first method 
of programming with BDDS to get some values of termbdd as a starting point 
for invoking the rules of the second method. 


termToTermBdd : term—termbdd 
termToTermBdd(t) applies termToBdd to t to get b and then returns t +> b. 


4.2 Computing Reachable States Using Representation Judgements 


Suppose th is an equational theorem | t; = to, e.g. a definition. If the right 
hand side t2 can be represented as a BDD 6 using termToBdd, then a BDD 
representation tb = t,; ++ 6 can be created by evaluating 


val tbO = BddEqMp (SYM th) (termToTermBdd(rhs(concl th))) 
Define BddDef : thm—termbdd by 


fun BddDef th = BddEqMp (SYM th) (termToTermBdd(rhs(concl th))) 
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Suppose constants B and R have been defined by theorems B_def and R_def, 
respectively, then BDD representation judgements 

tbB = B (v1,..-,Un) > bp 

tbR = R((v1,.--,Un),(Vj,---,Un)) > br 
are defined in ML by 


val tbB = BddDef B_def 
and tbR = BddDef R_def; 


Suppose that S has been defined by S_def and for some particular value of 
ithe BDD representation tbi = Si (v1,...,Un) +> 0; has been computed, for 
example, for i = 0, tbO is defined by 


val tbO = BddEqMp (SYM S_0) tbB 


An important BDD to calculate is the image of S i under R: 
duy ++: Un. Si (uy,...,tn) A R((uy,...,Un), (Y1,---5Un)) 
If this is directly converted to a BDD representation judgement, new BDD 


variables for ui, ..., Uy, may be created, which could be inefficient. A better 
strategy is to reuse the existing variables vu}, ..., vj}, by computing instead the 
BDD of 


dup e+ uh. Si (ul,...,uh) A R((vj,---,0n), (v1,---5Un)) 
A derived rule, BddImage : termbdd—termbdd—termbdd is easily defined to 
compute the image of a set under a transition relation. If 
tbP = P (v,...,0n) > bp 
tbR = R((v1,.--,Un),(vj,---,Un)) + bp 
then 
BddImage tbP tbR = 
BddForallAnd 
fur, ...,Unl 
tbP 
(BddReplace [(v1,v}),...5(tn,U,),(v},0)),...5 CU, Un) ] tbR) 
For example, since S (i+1) (v1,...,Un) is 
Si (eye, 0a) Vo Buy ee SA (Uys tt) A RE seis Ua)s Wins ead) 
the BDD representation of this is computed by BddOr(tbi, BddImage tbi tbR). 
The ML function iterateToFixedPoint2 defined below takes the transitive 
closure of R until a fixed-point is reached, returning a triple (th,tb,i_tm) where 
this S (i+1) (v1,...,Un) = Si(vi,..-,0n), toisS i (v1,...,Un) > 5 (where 
b; is the BDD computed) and i_tm is “i* 


fun iterateToFixedPoint2 S_suc tbR tb i = 
let val i_tm = intToTerm i 
val tb’ = BddEqMp 
(SYM(SimpNum(SPEC i_tm S_suc))) 
(BddOr(tb, BddImage tb tbR)) 
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in 

(TermBddOracle(BddEq(tb,tb’)) » tb, i_tm) 

handle oracleError => iterateToFixedPoint2 S_suc tbR tb’ (itt) 
end 


The representation judgement (37. S i (v1,...,Un)) > 0; is computed by 


let val (th,tb,i_tm) = 
iterateToFixedPoint2 S_suc (BddDef R_def) (BddDef S_0) 0 
val FpThi = SimpNum(SPEC i_tm FpTh) 

in 

BddEqMp (SPEC_ALL(MP FpThi (GEN_ALL th))) tb 

end 


where SPEC_ALL strips off all outmost universal quantifiers (inverse of GEN_ALL). 
Note the combination of HOL98 deduction (SPEC, SPEC_ALL, GEN_ALL, MP) and 
BDD calculation (BddEqMp). 

The function iterateToFixedPoint2 is much more efficient on large exam- 
ples than iterateToFixedPoint. For example, with iterateToFixedPoint, the 
calculation of the BDDs of the sets of reachable states of peg solitaire brings my 
500MB Linux box to a halt thrashing after a couple of days, and only reaches ab- 
out 15 steps. Using iterateToFixedPoint2 instead, all 32 steps are completed 
in a few hours. 


4.3 Future Possibilities 


The rules given above all have the same variable map p in the hypotheses and 
conclusion. The following experimental rules, which are currently not implemen- 
ted (and so are not given ML names), are not of this form. 

Let Frees(t) denote the set of free variables in t. Write p1 C pe if p; isa 
restriction of p2 (ie. Domain(p;) C Domain(p2)). The following rule then holds. 


pit+> b  poCpy Frees(t) C Domain(p2) 
pot rH b 


It may be the case that the BDD representing a term doesn’t depend on 
some variables in the term. For example, if {} denotes the undefined-everywhere 
function, then {} t Vt ++ TRUE holds for any term t. Thus any entries in the 
variable map that map variables to numbers not occuring in the support of the 
BDD can be pruned. Let Support(b) be the set of BDD variables (i.e. numbers) 
in b and let Range(p) denote the range of p (i.e. Range(p) = {n| dv. p(v) = n}). 
Then 


pit ++ b po pi __Support(b) C Range(p2) 
pat b 


Judgements with different variable maps can be combined if the maps are 
compatible. Define 


Compatible(p1,p2) = Vu my no. (pi(v) = m1) A (pa(v) = n2)=>(m1 = ne) 
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Compatible maps p; and p2 can be unambiguosly joined to form a map p; U po. 
Rules for combining judgements with compatible maps can be formulated, for 
example 

pity WH by p2 tz +> bg Compatible(1, p2) 

fi U poe ty Ate +> by AND bo 
The rules for other binary operators are similar. 

The next major stage in the development of HolBddLib will be to support 
judgements with different variable orders: i.e. calculation with pt 4 6 with 
several different ps in play, rather than just pg t +> 0b. This will enable different 
variable orderings to be used within a single session. It is also planned to extend 
the HOL98 theory management system so that judgements pt > 6 can be 
saved to disk and later reloaded, just like theorems of higher order logic. 


5 Related Work 


The Voss system [16] has strongly influenced the ideas described here. Voss con- 
sists of a lazy ML-like functional language, called FL, with BDDs as a built-in 
datatype. Quantified boolean formulae can be input and are parsed to BDDs. 
The normal boolean operations =, A, V, =, V, 3 are interpreted as BDD opera- 
tions. Algorithms for model checking are easily programmed. 

Joyce and Seger interfaced an early HOL system (HOL88) to Voss and in 
a pioneering paper showed how to verify complex systems by a combination of 
theorem proving deduction and symbolic trajectory evaluation (STE) [9]. The 
HOL-Voss system integrates HOL88 deduction with BDD computations. BDD 
tools are programmed in FL and can then be invoked by HOL-Voss tactics, which 
can make external calls into the Voss system, passing subgoals via a translation 
between the HOL88 and Voss term representations. 

In later work Lee, Seger and Greenstreet [11] showed how various optimised 
BDD algorithms could be programmed in FL. 

The early experiments with HOL-Voss suggested that a lighter theorem pro- 
ving component was sufficient, since all that was really needed was a way of 
combining results obtained from STE. A system based on this idea, called Vos- 
sProver, was developed by Carl Seger and his student Scott Hazelhurst. It pro- 
vides operations in FL for combining assertions generated by Voss using proof 
rules corresponding to the laws of composition of the temporal logic asserti- 
ons verified by STE [8]. VossProver was used to verify impressive integer and 
floating-point examples (see the DAC98 paper by Aagaard, Jones and Seger [1] 
for further discussion and references). 

After Seger and Aagaard moved to Intel, the development of the Voss and 
VossProver systems evolved into a new system called Forte. Only partial de- 
tails of this are in the public domain [14,2], but a key idea is that FL is used 
both as a specification language and as an LCF-style metalanguage. The connec- 
tion between symbolic trajectory evaluation and proof is obtained via a tactic 
Eval_tac that converts the result of executing an FL program performing STE 
into a theorem in the logic. Theorem proving in Forte is used both to split goals 


Reachability Programming in HOL98 Using BDDs 195 


into smaller subgoals that are tractable for model checking, and to transform 
formulae so that they can be checked more efficiently. 

The combination of HOL98 and BuDDy described here provides a similar 
programming environment to Voss’s FL (though with eager rather than lazy 
evaluation). BuDDy provides BDD operations corresponding to 7, A, V, =, V, 
4 and the HOL98 term parser plus termToBdd provides a way of using these to 
create BDDs from logical terms. Voss enables efficient computations on BDDs 
using functional programming. So does HolBddLib. However, in addition it al- 
lows FL-like BDD programming in ML to be intimately mixed with HOL98 
deduction, so that, for example, theorem proving tools (e.g. simplifiers) can be 
directly applied to terms to optimise them for BDD purposes (e.g. disjunctive 
partitioning). This is in line with future developments discussed by Joyce and 
Seger [9] and it appears that the Forte system has similar capabilities. 
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Abstract. In this paper we present a library of transcendental func- 
tions such as exp, log, cos, sin and tan and also an automated continuity 
checker for real valued functions, both done using PVS. Our aim is to de- 
velop theorem proving support for computer algebra systems, and other 
applications which rely on mathematical analysis. The focus of the paper 
is on the actual development done in PVS. 


1 Introduction 


The purpose of this paper is to describe our recent work on developing theorem 
proving tools to support computer algebra systems such as MAPLE and Mathe- 
matica in doing symbolic computation of integrals and differential equations. 

Many users of computer algebra systems are not in general concerned with 
developing mathematics, rather they take well-established theories for granted 
and want tools that are as approachable as the ones they are familiar with: 
Numerical packages like Simulink or the NAG library or symbolic computation 
systems like MAPLE or Mathematica. Clearly when handling parametric cases, 
using numerical tools is not an option unless one is content with an experimen- 
tal answer. However, there are some areas, including integration and solving 
differential equations, that symbolic computation systems do not handle well. In 
this papers we discuss a tool supporting symbolic computation systems in these 
areas. Descriptions of some applications of our tool can be found in [1,2] 

Harrison [4] developed a large library of real analysis in HOL-Light, con- 
structing the reals by using Dedekind cuts. Harrison implemented the theory 
of convergence nets and used these to define and reason about convergence of 
both functions and sequences. This forms the basis for power series and leads 
on to transcendental functions such as exp, log, cos, sin and tan which were 
all implemented. Furthermore a large collection of properties of these functions 
were proved. In PVS Dutertre [3] did an implementation of basic real analysis 
building on the axiomatic definition of the reals provided in PVS. Dutertre’s 
implementation includes definitions of convergence, continuity and differentiabi- 
lity of real-valued functions and all the usual theorems about these that can be 
found in mathematics text books. 

Based on Dutertre’s analysis library in PVS and using Harrison’s HOL-Light 
library as a guide we have developed a theory of transcendental functions in 
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PVS. In high school mathematics, trigonometric functions such as cos, sin and 
tan are normally defined using triangles and angles, but they can also be defined 
by power series; this was the approach taken by Harrison [4] and we have also 
used this method. 

As argued in [1] computer algebra systems do not handle definite integration 
with parameters well, and this is partly due to them not handling sideconditi- 
ons such as continuity. We have developed a basic continuity checker based on 
Dutertre’s real analysis library in PVS. The continuity checker uses general theo- 
rems such as The sum of two continuous functions is continuous. A high level 
of automation is vital if the checker is to fulfill its role of support to a computer 
algebra system used in applied mathematics, and with PVS we have obtained a 
checker which utilizes the type system of PVS and is completely automatic. 

We begin by describing some of the features in PVS which are most important 
to our development, particularly that of the continuity checker. Section 3 gives 
an overview of our implementation of the transcendental functions. In Sect. 4 
we discuss two different approaches to automating continuity checking in PVS. 
Finally we discuss some applications of our work (Sect. 5). 


2 Types and Judgements in PVS 


PVS [6,7,8,9] is a specification and verification tool based on higher order logic. 
It is strongly typed and supports subtypes and dependent types. PVS specifi- 
cations are organized in theories, which may be parametric. This allows us to 
write theories about eg. functions defined on some subset of the reals without 
restricting the theory to a certain subset. In this section we outline the basic 
uses of the type system and how judgements are used to aid typechecking. We 
also briefly explain how one can use the PVS strategy language to direct proofs. 


2.1 Types 


PVS contains primitive types such as real numbers and booleans and the usual 
constructors for function, record, and tuple types. For example, [real, nat -> 
real] is the type of functions from pairs of reals and nats to reals. PVS also 
supports abstract datatypes [8]. 

PVS supports two different ways of declaring subtypes: either (i) using a 
boolean expression as in 


negreal : TYPE = {x : real | x < 0} 


which declares negreal to be the type of negative reals, or (ii) by declaring an 
uninterpreted subtype as in 


s : TYPE from t 


which declares s to be a subtype of t. 
Since the user can give arbitrary boolean expressions in type declarations 
typechecking is undecidable. Therefore Type Correctness Conditions (TCCs) 
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are generated during typechecking. Some of these might be discharged by PVS 
automatically, but others might be left for the user to prove. For example, ty- 
pechecking the function definition 


divi(x:negreal) : real =1/x 
raises this TCC 
divi_TCC1: OBLIGATION (FORALL (x: negreal): x /= 0) 


This is because division is of the type [real, nzreal -> real] (where nzreal 
is the type of non-zero reals). 


2.2 Judgements 


Judgements are used to help the typechecker discharge some of the many TCCs 
that might occur when using subtyping. Considering again the function divi 
from above, we need a judgement asserting that negreal is not only a subtype 
of real but indeed a subtype of nzreal. The following judgement (from PVSs 
real library) does just that: 


negreal_is_nzreal: JUDGEMENT negreal SUBTYPE_OF nzreal 


The judgement is then used by the typechecker, and so the user will not be asked 
to prove the TCC divi_TCC1. 

The judgements are a powerful tool, since type correctness is essential for 
applying any theorems during automatic proving. In Sect. 4 we will give an 
example of how judgements are used in this way to check functions for continuity. 


2.3 Overloading Operators 


A useful feature of PVS is overloading of operators. Since PVS is strongly typed 
it is easy to determine which version of an operator is being used. For instance, + 
is defined on numbers but Dutertre [3] overloads this to be defined on functions 
too: 


+(f1, £2) : [T -> real] = LAMBDA x : f1(x) + f£2(x); 


So + is defined on functions from the type T to the reals. However as no infor- 
mation about judgements is carried over from the existing operator, a full set 
of judgements for the new operator might be necessary. An example of this is 
explained in Sect. 4.3. 


200 H. Gottliebsen 


2.4 Strategies 


The PVS prover contains high level proof commands but also supports a strategy 
language which allows users to write their own proof strategies (9]. 


Example 1. 


bar [£ T : TYPE ] : THEORY 
BEGIN 
f : VAR (T -> T] 
bar-lemma : THEOREM 
*some theorem* 
END bar 


This theory has the parameter T, which is then used in the declaration of the 
function f. 


(defstep foo (foo-arg) 
(let (foo-lemma (format nil "bar-lemma[~a]" foo-arg)) 
(TRY (lemma foo-lemma) (GRIND) (SKIP))) 
‘*Tf bar-lemma[foo-arg] succeeds and produces sub-goal(s) 
then run grind otherwise skip’’ 
‘*Tries to apply bar-lemma with the right theory instantiation’’) 


The name of this strategy is foo; it takes one argument foo-arg. If we want to 
apply the strategy when using functions of eg. type [bool -> bool] we must 
specify that we want the actual parameter of bar to be bool. This is handled 
by foo-arg, when the strategy is used as follows: 


(foo ‘‘bool’’) 


The first part of the let-expression names bar-lemma[bool] foo-lemma, the 
second part is an application of the built-in strategy TRY, which tries to apply 
the first argument to the current sub-goal, if successful it then applies the second 
argument to each new sub-goal, if not — it applies the third argument instead. 


By writing application specific strategies it is possible to gain a very high level 
of control over proofs and keep them automatic at the same time. Mechanisms 
are in place for inspecting goals and use the information to determine what the 
next step within a given strategy should be [9]. However, in many cases the high 
level proof commands available in PVS are suitable on their own. 


3 Library of Trigonometric Functions 


PVS includes the real numbers as part of the provided system. Together with 
the definitions is a large collection of theorems about linear expressions of real 
numbers, including the usual lemmas about eg. commutativity and associativity 
and also cancellation rules for equations and inequalities. Whereas this collection 
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includes most basic rules, such as 0 * x = 0, 1 <Oif<Oandarxy<z*y 
iff (2 < z and y > 0) or (z < « and y < 0), it is still lacking similarly simple 
lemmas which are useful for doing analysis eg. |c| <1 = c*az < x where x > 0. 
However, there is enough of a foundation to enable the development of those 
lemmas. 

PVS also has an exponentiation function. It is restricted to take non-negative 
integer powers of real numbers. With this definition there is a limited selection 
of theorems about the power function such as 2” > 0 for x > 0 and x” £ 0 for 
x #0. 

As we shall see later another useful existing type is that of the polymorphic 
sequence. This has all the usual operations such as first, rest, delete and insert. 


3.1 Existing Analysis Library 


PVS’ definition of the reals forms the basis of Dutertre’s library [3] for basic 
analysis. This library contains many basic definitions and theorems used in real 
analysis, including sequences of reals; convergence of functions and of sequences; 
continuity and differentiation. Below we briefly outline the contents of the library 
for each of these areas of real analysis. 


Sequences of Reals. Proving theorems about sequences of reals such as in- 
creasing or decreasing, extracting a subsequence and how subsequences inherit 
properties such as boundedness. Also contains a theory defining convergence of 
sequences and gives various criteria for a sequence to be convergent. Finally 
the limits of the usual combinations of sequences are given, eg. the limit of 
51(7) + 89(n) is the sum of the limits of s;(m) and s2(n). 


Limits of Functions. The limit of a real-valued function is defined by the usual 
e-§ definition. Again lemmas are given for calculating limits of combinations of 
functions. Finally various bounds on limits are listed. 


Continuous Functions. As with the limits, the ordinary ¢-6 definition is used 
and we see which operations on functions preserve continuity. By considering a 
continuous function restricted to a subinterval of its domain further theorems 
are proved, eg. the intermediate value theorem. 


Differentiation. Again Dutertre uses the standard definition using the Newton 
quotient: A function f is differentiable at x iff 


f(x + Ax) — f(x) 
ae (1) 


has a limit as Az tends to 0. The derivative is then defined wherever the function 
is differentiable; it takes the value of the limit. The fact that a differentiable 
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function is continuous is established and again we have all the usual rules for 
combining functions and preserving differentiability, including the values of the 
derivatives. The value of the derivative at a maximum or a minimum is also 
given, as is the mean value theorem. 

In addition to the analysis library Dutertre has also built a theory to handle 
roots. This implementation covers positive integer roots of nonnegative reals and 
provides a useful extension of the power functions native to PVS as it provides 
a way to handle rational powers. 


3.2 The Extension 


Dutertre’s library supports rational functions, that is functions made up of the 
identity functions, constants and the combinators +, — (unary as well as binary), 
* and /. We want to provide support for functions such as exp, log, cos, sin and 
tan, which are called transcendental functions. Combinations of rational and 
transcendental functions are called elementary functions, eg. 


cos(z) 


F(z) = exp(x +a)” 


(2) 

In high school the trigonometric functions are described using triangles and 
angles, but one can also define them by certain power series. This allows for 
analytical treatment of them. Following the usual mathematical development of 
the transcendental functions as limits of certain power series, we first. develop 
a theory of partial sums, then consider sequences of these to determine conver- 
gence criteria for series. Particularly useful for this extension is the theory of 
convergence of sequences already available in Dutertre’s library. We then define 
transcendental functions by their power series and via the power series prove a 
collection of theorems about the functions. 

In general our definitions and lemmas are equivalent to Harrison’s, but as we 
use Dutertre’s library as a basis and not Harrison’s more general approach, using 
convergence nets, our implementation is not quite as extensive. However, for 
the domain of transcendental functions, the two implementations do correspond 
closely. 

Below we describe the various definitions and properties in the extension. 
Many more properties of the functions have been proved, but space does not 
allow us to include them all here. 


Partial Sums. Our definition of partial sums is a little unusual as sum(n,m) (f) 
is 


n+m—-1 


2 FG) (3) 


However, this definition proves to be more flexible than the usual one starting 
the summation from 0, allowing many theorems about partial sums to be more 
easily proved. 
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In PVS we define the sum-function using recursion: 


sumc(n,m,f) : RECURSIVE real = 
IF m = O THEN 0 
ELSE sumc(n,m-1,f) + f(n+m-1) 
ENDIF 
MEASURE m 
sum(n,m)(f) : real = sumc(n,m,f) 


Major theorems useful in the further development include: 


n+m-1 n+m-1 

~ fol s Do OH (4) 
n+m—~1 n+m-1 n+m-1 
~ GH +o = YO K]H+ VY a. (6) 


Series. A series converges if the sequence of its partial sums converges. In that 
case, the sum also has the value of the limit of the sequence. 


sums(f,s) : bool 
= convergence(LAMBDA r : sum(0,r)(f),s) 
summable(f) : bool = EXISTS s : sums(f,s) 
suminf(f : {g|summable(g)}) : real = epsilon(LAMBDA s : sums(f,s)) 


The operator epsilon is the choice operator of PVS. Here it is used to extract 
the s such that sums (f,s) ie. to extract the value of a convergent series. 

Amongst the main theorems proved in Harrison’s work and also proved for 
our extension of Dutertre’s library are: 


If SY“ F(a) converges then so does) ~f (i +k), (6) 


1=0 i=0 


If fo = DF (@) and go = S° (i) then S° F(t) + 9) = fot go- (7) 
i=0 


i=0 i=0 


Results similar to (7) also hold for subtraction, negation, and multiplication and 
division by a non-zero constant. 


Convergence Criteria. With the foundations laid we can now go on to use 
these theorems. Without aid in determining convergence one would have to go 
back to the definition for each series. This would be an unreasonable burden to 
put on any user, so we develop the following convergence criteria for series: 
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lee) 
Cauchy-type S¢ f(i) is convergent iff 
i=0 


n+m—1 
Ye >0.3N.Vn> Nm > 0. | s, IWI <e- 


Comparison If ule) converges and INVn > N. |f(n)| < g(n) then 5s" F(i) 


converges. =n ai 
Ratio Test If Vf, N,c<1,n > N.|f(n+1)| < cx (f(n)| then >> f(z) 
converges. = 


Power Series. As we are particularly interested in power series, we prove some 
more theorems regarding convergence of them: 


= 4 for |r| < 1. 


a) 
>> f(z) * x converges and |y| < |x| then 5° f(z) * y’ converges. 
i i=0 


il 
Q 


We also prove a theorem about differentiation of power series. 
Finally we are ready to begin to define the power series describing transcen- 
dental functions. 


exp and log. exp is defined in the following way: 


ie o7 


exp(z) = y5e! : (8) 


i=0 


The convergence of exp is proved using the ratio test. 

We can now prove that exp is differentiable with the expected results. From 
this follows also that it is continuous everywhere. 

For both exp and log, there is a large collection of well known facts as can be 
found in many text books. We have proved a useful collection of these. Firstly 
about exp: 


exp(0) =1, (9) 
Va Vy. exp(x + y) = exp(zx) exp(y) , (10) 


Vy.0< y= dz. exp(x) = y, where x is unique (surjective on IR;). (11) 


This leads us on to log, which is defined on the positive reals by using the 
choice operator epsilon in PVS 


log(x) : real = epsilon(LAMBDA y : exp(y) = x) 
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So log(x) gives a y such that exp(y) = x, and we know from the theorem above, 
that this y is unique. 
There are a few theorems about log; we present just a couple of them: 


log(1) =0, (12) 
Ve >O0Vy>0. log(™) = log(x) — log(y) . (13) 


Trigonometric functions and 7. We now define cos and sin by their power 
series. To prove convergence of these series we used the comparison test and the 
fact the power series for exp is convergent. 

sin is defined in the following way: 


sin (x) = Sesh oh 2i+1 : (14) 


And cos is defined in the following way: 


cos(x) = or ‘ (15) 


We proved a large collection of standard facts about cos and sin; here are 
some of the theorems: 


sin(0) =0, (16) 

cos(0) =1, (17) 

Va. sin(x)? + cos(x)? =1, (18) 

Ve. —1<sin(x) <1, (19) 

Va. —1<cos(x) <1, (20) 

Va Vy. sin(x + y) = sin(x) cos(y) + cos(x) sin(y) , (21) 
Vr. sin(—x) = —sin(2x) . (22) 


Now we want to give a definition of 7. We first prove that there is a unique 
x between 0 and 2, such that cos(x) = 0. Then 7 is defined to be 2 times this zx, 
again using the choice operator of PVS. 
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There are various lemmas relating cos, sin and 7: 


r>0, (23) 
cos(m) =—1, (24) 

sin(r) =0, (25) 

Vi. sin(z) = cos( see (26) 

Vk. sin(kr) =0, (27) 

Vax. cos(x) = 0 iff ((Jodd k.x = ke) V (dodd k.2z = -k)) ; (28) 
Vax. sin(x) = 0 iff ((J even k. 2 = a) Vv (deven k. x= -k5)) - (29) 


The function tan(zx) is undefined for the values of x where cos(x) = 0, and we 
just proved that those xs are of the form i where k is an odd integer, therefore 
we define a new type, which will be the domain for tan: 

x : VAR real 
k : VAR int 


cos_nz_type : NONEMPTY_TYPE 
= {x | FORALL k : x /= (2* k +1) * pi / 2} 


We can then define tan and prove some lemmas about it: 


Va. cos(x) #0 => tan(x) = ee ; (30) 
tan(nz) =0, (31) 
Va. cos(xz) #0 = tan(—2x) = —tan(z) , (32) 


tan(z) + tan(y) 
1 — tan(z) tan(y) © 

(33) 

If we want to differentiate a function which is not defined on the full set of 

the reals, with the current implementation we have to restrict the function to 

an interval around the point of interest, such that the function is defined on the 

whole of that interval. So we prove the two following theorems: 

1 


Vadk.ky <2 < ko A tan | 4, ,k2) (2) = cos(x)®” for ky = kn — Soke =kr+ 


Va Vy . (cos(x), cos(y), cos(z + y) #0) => tan(z+y) = 


Tv 
9? 


2 


e T 
Vy da. cos(x) #0 A -5<2<s A tan(z)=y A z is unique . 


Transcendental Functions and Continuity Checking in PVS 207 


3.3. Example Proof 


The proofs in this development seems to fall into two categories. Either they 
are small and quite easy, or they are more complicated and tend to get very 
long. Proofs about properties of finite series fall into the first category, as they 
tend to be simple induction proofs. As one might expect, also a great part of 
the theorems about properties of the trigonometric functions are quite simple. 
This is because once the basic tools are in place we no longer have to go back 
to the e—6 definitions to prove convergence to certain values. Between these two 
extremes of the development is a part which in general requires a lot of work. 
We will here only consider one of the easier proofs. 
We want to prove the following theorem: 


sin_cos : THEOREM 
FORALL x : sin(x) = cos(pi / 2 - x) 


These four lemmas are used in the proof: 
cos_add : LEMMA 
FORALL x, y : cos(x + y) = cos(x) * cos(y) - sin(x) * sin(y) 


cos_pi2 : LEMMA 
cos(pi / 2) = 0 


sin_pi2 : LEMMA 
sin(pi / 2) 


i] 
=o 


sin_neg : LEMMA 
FORALL x : sin(-x) = -sin(x) 


The proof is as follows: 


(SKOLEM! ) 

(USE "cos_add" ("x" "pi/2" "y" "-x!4")) 
(LEMMA "cos_pi2") 

(LEMMA "sin_pi2") 

(USE “sin_neg" ("x" "x!1")) 

(ASSERT) 


It applies lemma cos_add to pi/2 and -x to get 

cos(pi / 2 - x) = cos(pi/2) * cos(-x) - sin(pi/2) * sin(-x) 
Then we use lemma cos_pi2 and sin_pi2 to get 

cos(pi / 2 - x) = 0 * cos(-x) - 1 * sin(-x) 

And finally using sin_neg we get 

sin(-x) = -sin(x) 


And this completes the proof. 
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4 Automated Continuity Checking 


In this section we describe how we can determine if a function f : A + IR, where 
A CR, is continuous. For simple functions, this is an easy task as we apply 
the basic rules about combining continuous functions using eg. addition. But 
considering more complicated functions, maybe using composition, it becomes 
a lot harder to control. This is one case where a theorem prover can provide 
support. Our aim is to have an automated continuity checker, in the sense that 
the user should be able to pose his problem and use just a single command to 
get some sort of answer. This would be useful in conjunction with eg. computer 
algebra systems [2]. 


4.1 General Idea 


We use Dutertre’s (3] continuous_functions theory as the basis for our imple- 
mentation. It builds on the well known definition of continuity, 


Definition 1. Let f: A IR andaec A. We say that f is continuous at a if 
Ve € R 46 € Ry: Vere A: |xr—al <6 => |f(z)— fla)| <eé 


When checking if a certain function is continuous at some point we consider 
the term describing it, eg. 2 + 2x7. In this case it is a sum of two functions, 
and we know that if each of the two functions is continuous, then so is the 
sum. Similarly holds for subtraction, multiplication and division, although in 
the latter case we must also make sure the denominator is non-zero. It is clear 
that using this approach we can syntactically take the term apart, with the basic 
parts being constant functions and the identity function. 

What is described here is clearly a very basic method, and by no means a 
complete one. For example, the function 


xr+1 
z+1- 


f(t) = (34) 
is not defined at —1 as such, and with this representation PVS’s typechecker 
would require x # —1. However it is clear that if we add to the definition of f 
so that f(1) = 1, it could be simplified to 


f(z) =1. (35) 


And by using L’Hospital’s rule on (8) we see that f is indeed continuous at —1. 
Cases like these are not yet provided for in our PVS implementation. 

We want to use the PVS theory continuous_functions by Dutertre [3]. This 
theory contains theorems about conserving continuity under certain operations 
together with the base cases of constant functions and the identity function 
being continuous. The theory is parameterized with a type T, which is used as 
the domain-type for the functions. 

Let f1,f2:T —- Rand g:T > R\O be functions, let x0 € T and let 
k © IR. We then use the following named theorems: 
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sum_continuous : THEQREM 
continuous(f1, x0) and continuous(f2, x0) 
implies continuous(fi + £2, x0) 


diff_continuous : THEOREM 
continuous(f1, x0) and continuous(f2, x0) 
implies continuous(f1 - £2, x0) 


prod_continuous : THEOREM 
continuous(fi, x0) and continuous(f2, x0) 
implies continuous(f1 * £2, x0) 


const_continuous : THEOREM 
continuous(k, x0) 


scal_continuous : THEOREM 
continuous(f1, x0) implies continuous(k * f1, x0) 


opp_continuous : THEOREM 
continuous(f1, x0) implies continuous(- f1, x0) 


div_continuous : THEOREM 
continuous(f1, x0) and continuous(g, x0) 
implies continuous(fi/g, x0) 


inv_continuous : THEOREM 
continuous(g, x0) implies continuous(1/g, x0) 


identity_continuous : THEOREM 
continuous(I[T], x0) 


abs_continuous : THEOREM 
continuous(fi,x) implies continuous(abs(f1) ,x) 


We want to use these theorems in as automatic a way as possible, and so 
we have two options; either we write special strategies to direct the proof, or 
we provide enough information for PVS to match the theorems to the input. In 
Sect. 4.2 we outline how strategies might be used to solve the problem, then in 
Sect. 4.3 we see how even better results can be achieved by the use of judgements. 


4.2 Using Strategies 


In this section we describe a strategy which directs PVS in doing continuity 
checking. We will not give all the details, but rather give an overview of how the 
strategy works. 

There are two parts to the strategy, one which is called from the prover and 
another which is internal and only meant to be called from the top level strategy. 
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The top level strategy is called cts, and it does not take any arguments. The 
first thing cts does is check if the current goal fits the strategy, in that it must 
be of the form continuous (f ,x) 

In fact we allow for some variation, such as 


FORALL x : p(x) => continuous(f,x) 


where p is some predicate, but since these variations do not change the behavior 
of the main elements of the strategy, we will omit them in this overview. 

As the PVS theory used for continuity is parametric in the domain type of the 
function, we then work out what that type is. Because PVS is strongly typed, 
this information is obtainable from the goal itself and the strategy language 
provides means to extract it. 

The next step, which is done by a recursive strategy, is to apply the theorems 
of Sect. 4.1 in the correct order using the appropriate instantiations. Again PVS 
provides mechanisms for checking if a term is an application and the number 
of arguments it takes — we have to distinguish between the unary and binary 
versions of —. Having identified the top level symbol (+, —, *, /, |-|, identity or 
constant functions) we can then apply the appropriate theorem. For example, 
for x € IR, consider continuous(LAMBDA x : x+2/x,3). Here “+” is the top 
symbol, and so we first apply sum_continuous. This then gives us two sub-goals: 


continuous (LAMDBA x : x,3) 
continuous (LAMBDA x : 2/x,3) 


A Type Checking Condition (TCC) is also generated 
FORALL (x : posreal) : x /= 0 


But it is automatically discharged by PVS, so we do not have to handle it. In 
more complicated cases PVS might not be able to discharge the TCC automa- 
tically, so the strategy takes care of TCCs too. 

Each of the two sub-goals can be handled by the inner strategy. The first one 
is proved in the next step by using identity_continuous, whereas the second 
needs a few more steps. 

At present our strategy relies on PVS to work out the instantiation of the 
theorems, and whereas in simpler examples, such as 


continuous(LAMBDA x : 1/(Ixl+1) + 2/(|x1+2),y) 


this works well, it is not too hard to confuse PVS. There is a way to get around 
this, and that is to not only have the strategy decide (based on examining the 
current goal) which theorem to use next, but also the instantiation of it. Again, 
this information can be extracted from the goal, but the more detail we need to 
extract from the goal the more involved is the strategy, so we have not yet done 
this. 
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4.3 Using JUDGEMENTS 


Interaction with the PVS team suggested that rather than writing ever more 
complicated strategies for continuity checking we should be looking to provide 
the PVS typechecker with the judgements needed to do all the matching. This 
method turned out to be surprisingly simple with very good results. 

By giving judgements, eg 


negreal_minus_nnreal_is_negreal: 
JUDGEMENT -(nx:negreal, nny:nnreal) HAS_TYPE negreal 


we tell PVS that a certain kind of expression has a certain type, in this case that, 
a negative real minus a non-negative real gives a negative real as a result. 

For PVS to automatically (eg. while using the high level command GRIND) 
apply the theorems, the typechecker must be able to decide that the arguments 
are of the appropriate type, eg. to apply div_continuous, it must know the 
divisor to be a non-zero function. If the right judgements are in place, this will 
happen. 

The above judgement is concerned with the type of subtracting two real num- 
bers, but what we need in order to use the theorems of continuous_functions is 
judgements about the the usual function combinations as higher order operators, 
eg. subtraction of two functions, so that 


1 


f= Taya) -3 


(36) 
will be matched to div_continuous. 

Dutertre [3] already defined the higher order versions of +, — (unary and 
binary), *, /, and |-|, but we have added the following declarations and judge- 
ments: 


npfun : TYPE = [T -> npreal] 


ph, pg : VAR posfun 
nh, ng : VAR negfun 
nzh, nzg : VAR nzfun 
nnh, nng : VAR nnfun 
nph, npg : VAR npfun 


npfun_plus_npfun_is_npfun: JUDGEMENT +(nph, npg) HAS_TYPE npfun 
npfun_minus_nnfun_is_npfun: JUDGEMENT -(nph, nng) HAS_TYPE npfun 
npfun_times_npfun_is_nnfun: JUDGEMENT *(nph, npg) HAS_TYPE nnfun 
npfun_div_posfun_is_npfun: JUDGEMENT /(nph, pg) HAS_TYPE npfun 
npfun_div_negfun_is_nnfun: JUDGEMENT /(nph, ng) HAS_TYPE nnfun 
minus_npfun_is_nnfun: JUDGEMENT -(nph) HAS_TYPE nnfun 


We have also added similar judgements for nzfun, posfun, negfun, nnegfun, 
and judgements for the |-| and identity functions. These judgements are mainly 
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generalisations of the ones for real numbers found in the PVS distribution, ho- 
wever we added a few other useful ones too. 

By including these judgements we have used successfully proved continuity 
of functions such as: 


exp(x? + [1 — al) , (37) 
exp(cos(z) + 1) , (38) 
1 2 
$e ee 39 
lel+ 1 * [elt ey 
And defined only on the negative reals: 
1 


These proofs could all be done using only GRIND with the appropriate theories 
added, however to help the user even further we wrote a new strategy. It simply 
inspects the goal to obtain the domain type of the functions to be proved con- 
tinuous. It then instantiates all the parameterised theories correctly and calls 
GRIND. 

Using judgements for this problem has solved much of it in a very nice way. 
The judgements are short and easy to understand compared with specialised 
strategies. However, it seems likely that it will cause considerable difficulty to 
generalise the judgements to cover yet more functions. We would like to be able 
to handle functions with say a division with a trigonometric functions such as 
cos or sin in the denominator. In some cases, this would be obtainable, like in 


(os 


cos(x) + 2° a) 


Here we might consider judgements saying that cos(s) is between —1 and 1 and 
that adding something strictly greater that 1 is positive. As the denominator can 
be arbitrarily complex, it is not clear that this is a viable solution in the more 
general case, but already the use of judgements by the typechecker has solved 
fairly complicated cases. 


5 Applications 


We have successfully used the library of trigonometric functions with DITLU 
[1], a table look-up for symbolic definite integrals. The table works on integrals 
of the form 


b 
/ f(x) dx (42) 


where both the limits and the function may include parameters. Dependent on 
the function and the values (or ranges) of the parameters the integral might be 
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undefined or take a particular value. In the table this is represented as case- 
statements and we used PVS to eliminate the cases that could definitely not 
occur for a given query. For example, one of the entries in the table is 


0 (b=c) 
unde fined (¢qZ#O0)A(bAc)A 
((6=—®) v(c=—*)) 
i Dera: mee lagi ah log lab Pl Zo) Ald c)A 
lad (b# -B) A(c# -8)) 
SSS SS ee 
(b #c) A (p #0) A (q=0) 
pe I NN 
unde fined (b#c) A (p=0) A (q =0) 
So with a query like 
a+1y 
[. 7, (43) 


we get the following match with the entry above: 
b= -—a,c=a+1,p=0,¢=1. 


We see that dependent on the value of a the first three cases might occur, case 
lifa= = and cases 2 and 3 if a = —1. However, the last two cases will not 
occur with this query, since g = 1. So the result of the query is the first three 
cases only. We used PVS to check these sideconditions in order to remove the 
cases which can not occur. 

We have used the continuity checker together with experimental MAPLE 
code [2] to provide safer solutions to differential equations. In general computer 
algebra systems do not check all the sideconditions of well-know theorems before 
applying them. One such example is the Fundamental Theorem of Calculus 


Theorem 1. I, f(z) dz = g(c) — g(b) where g is the antiderivative of f and f 
is continuous on [b,c]. 

Applying this theorem without checking that f is continuous on the interval can 
lead to errors. The experimental MAPLE code returns the usual results, but also 
conditions on the results, for instance that some function is continuous on some 
particular interval. We used our continuity checker to prove these sideconditions 
for examples such as 


y' (xz) + y/(2Vx — a) = log(b — x) exp(—Vx — a) . 


6 Discussion 


We have presented here the outline of our implementation of two different addi- 
tions to PVS: a library of transcendental functions and an automatic continuity 
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checker. The library of transcendental functions is similar to that of Harrison 
[4], although our implementation is not built on convergence nets but on the 
standard e-é definition of convergence. The library includes definitions of the 
functions exp, log, cos, sin, tan, cos~!, sin? and tan~! and a large collection 
of theorems about properties of these functions. 

The automatic continuity checker deals with continuity for a wide range of 
combinations of functions, including some from the library of transcendental 
functions. The drawback of relying on judgements and the built-in matching 
in PVS is that with more complicated expressions (eg. in the denominator of 
a division) more specialised types and so judgements might also be needed. It 
seems clear that a combination of strategies and judgements is the best approach 
to get even more general results. We are currently exploring this. 

Of further interest would be to consider an implementation of complex ana- 
lysis in PVS, as this would allow for proper treatment of phenomena like branch 
cuts. This would require a complete reworking of the analysis library, but would 
then support more aspects of analysis. 
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Abstract. This paper outlines a formal model of the Intel [A-64 archi- 
tecture, and explains how this model can be used to verify the correctness 
of assembly-level code optimizations. The formalization and proofs were 
carried out using the HOL Light theorem prover. 


1 Introduction 


Current microprocessors dynamically reorder the sequence of instructions being 
executed to extract greater performance. The IA-64 takes a different approach [3, 
6|. By exposing architectural features that would ordinarily be hidden, [A-64 
allows the compiler to reorder instructions prior to execution. By moving the 
burden of instruction reordering from hardware to software, resources are freed 
to increase performance in other ways. 

Formal methods have been applied to instruction reordering hardware [7], but 
as the responsibility for reordering instructions moves from hardware to software, 
so does the obligation to ensure the reorderings preserve the meaning of the 
code. This paper proves the correctness of some of the instruction reorderings 
performed in software for the [A-64. The work described deliberately stops- 
short of tackling the open-ended difficulty of verifying general properties of IA- 
64 programs. Instead, the proofs are limited to checking equivalence between 
similar small programs; the kinds of proofs that typify verification of individual 
optimizing transformations. The purpose of the work is to investigate the extent 
to which such proofs can be automated. The formalization and proofs were 
carried out with the HOL Light theorem prover [5]. 


2 Examples: Control and Data Speculation 


This paper will focus on two examples of instruction reordering. The first illustra- 
tes control speculation, where an instruction is executed even though the result is 
not known to be needed. The second example illustrates data speculation, where 
an instruction is executed using data not known to be accurate. 
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original code optimized code 
(p1)br label 1d8.s r9 r5 
148 r9 r5 (p1)br label 
add r2 r9 r3 chk.s r9 reload 
continue: add r2 r9 r3 
reload: 1d8 r9 r5 


br continue 


Fig. 1. Control speculation example 


2.1 Control Speculation 


The execution of most IA-64 instructions may be predicated on the value of a 
one-bit, predicate register. If the nominated register holds true, the instruction 
executes normally; if not, it has no effect. Predicated instructions are written 
with the predicate register parenthesized to the left. An example can be seen in 
the first instruction of the ‘original’ code fragment in Fig. 1. 

Consider the original code presented in Fig. 1. If p1 holds true, then control 
branches to label. If not, execution falls through to the next instruction, which 
loads the general purpose register r9 with 8 bytes from the address held in r5. 
The values held in r9 and r3 are summed, and the result stored in r2. 

Load instructions take several cycles to complete, and in this program the 
load is followed by an add which depends upon the value loaded. The execu- 
tion of the add must therefore stall, to allow the load to complete before it can 
execute. We would like hide the load latency by moving the load earlier in the 
instruction stream. Unfortunately, we cannot execute the load earlier as it ap- 
pears immediately after a conditional branch; if the branch is taken the load 
should not be executed. It is tempting to think we could move the load before 
the branch and ignore the result if it is not needed; this would be a control spe- 
culative execution of the load. However, this could cause a fault if the load tried 
to access an invalid address. A correct, but unnecessary, load could also incur a 
performance penalty if it required nonresident memory to be swapped in. 

On a traditional architecture the load would stay where it was, but the IA-64 
offers a way around these problems. Every register has a corresponding one-bit 
tag called a not a thing (nat) bit. A control speculative version of the load 
instruction is provided, which quietly sets this bit rather than causing a fault 
(including a page fault). The nat bit can be checked to see if the load succeeded. 
Using this feature, the code can be optimized as shown in Fig. 1. 

The optimized code begins with a control speculative load, which attempts 
to read data into r9. If the load fails, the nat bit of r9 is set. Later, the chk.s 
instruction checks if r9 contains valid data. If the nat bit is clear, the load 
succeeded and execution continues unaffected. If it is set, execution is transferred 
to the code labeled reload, where the load is retried. The second load will exhibit 
the true faulting behavior, perhaps causing nonresident memory to be swapped 
in so the load can complete. Both paths then execute the add instruction. 
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original code optimized code 
st2 r1 r2 1d4.a r3 r4 
144 r3 r4 st2 ri r2 


chk.a r3 reload 
continue: 


reload: 1d4 r3 r4 
br continue 


Fig. 2. Data speculation example 


2.2 Data Speculation 


Figure 2 gives another example optimization. Consider the unoptimized code. 
The first instruction stores two bytes from r2 to the address held in ri. The 
second loads four bytes into r3 from the address held in r4. As before, we would 
like hide the load latency from any subsequent instructions by moving the the 
load earlier in the instruction stream. However, this would result in incorrect 
behavior if the memory region written by the store overlaps that read by the 
load. In most situations the regions will not overlap, but they might, so the load 
must remain after the store. 

The IA-64 provides a way around this obstacle with a special data speculative 
advanced load instruction. The advanced load records the region of memory a 
register was loaded from. A test can be used to check if the region has been 
overwritten since the load. Using this feature, the code can be optimized as 
shown. The optimized code begins with an advanced load, followed by the store. 
Next, register r3 is checked to see if it was effected by the store. If it was, a 
branch is taken to the label reload, where it is reloaded with the correct data.! 


3. A Model of the IA-64 


Our aim is to describe an abstract model of an [A-64 machine that can be 
used to show that the optimized code fragments just presented have the same 
behavior as the corresponding unoptimized fragments. This section will describe 
each component of the IA-64 architectural state that needs to be modeled to 
verify these optimizations. 


3.1 Data Memory 


We make the simplifying assumption that the instruction and data memories 
can be modeled separately. The data memory is defined in terms of two types, 
word and size. The type word describes 64-bit words, which are used to hold 
both addresses and data. The type size describes the units in which memory is 
accessed: 1, 2, 4 or 8 bytes. 


' Other IA-64 instructions can handle these simple examples more succinctly. Here we 
present only the most general forms of speculation and recovery. 
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word = {n| n < 284} size“ {1,2, 4,8} 


The function zext takes a word w and a size s, and returns a new word with 
a zero-extended copy of the first s bytes of w. 


def ext ws = w mod 2°** [zext_def] 


A word (address) and a size together describe a region of memory. The pre- 
dicate overlapped determines if two regions overlap. 


if overlapped a, $1 a2 $2 = {overlapped _def] 
(dr-ay <a Ax <ayt+s) Nag <2Az < a2 +52) 


The data memory is modeled as a function from words (addresses) to words 
(values). The mem-_read and mem_write operations are described as follows: 


2f mem_read mas = zext (ma) s [mem_read_def] 


#*f (mem_read (mem_write m a s w) a s = zext ws) A [mem_write_def] 
(overlapped a, s1 a2 82 => 
mem_read (mem_write m ay so w) a, $1; = mem_read m a, 8;) 


Note that the behavior of reads and writes that access overlapping, but not 
identical, regions of memory is unspecified. Accurate modeling of such accesses 
is not necessary to verify optimizations like the ones discussed here. 

Not all regions of memory are valid sources or destinations, this includes 
those that extend outside the address space, but may include others as well. 
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4 mem_valid_sourcea s —>a+s< [mem_valid_source_def] 


tf mem_valid_dest as —> a+s < 2% [mem_valid_dest_def] 


Some, sequential, regions of memory should be accessed only in the order ori- 
ginally specified. If, for example, IO devices are mapped into the memory space, 
those regions will be sequential. We do not specify which regions of memory are 
sequential, only that some may be. 


tf mem_seq a s => T [mem_seq_def] 


Not all regions of memory may be read speculatively. This includes invalid 
sources and sequential regions, but may include other regions as well. An obvious 
example is memory that is nonresident and would therefore need to be swapped 
in. The validity of speculatively accessing a memory region may change as the 
state of the machine changes. The mem_valid_spec_source predicate takes an extra 
parameter x to represent the abstract state of the machine. It is not necessary to 
specify how the value of mem_valid_spec_source depends on x, only that it may, 
and that the type of z is sufficiently large to encompass the state of the machine. 
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4*f mem_valid_spec_source 1 a s => [mem_valid_spec_source_def] 
mem_valid_source a s A =mem-_seq a s 


Memory Access Ordering: It is not always possible to reorder memory acces- 
ses as described in Sect. 2.2. Code to synchronize multiple processes may depend 
on the precise ordering of those accesses. Changes to the memory access ordering 
that appear correct when viewed from the perspective of an individual process, 
may not be correct when the collection of processes are considered as a whole. 
The IA-64 provides special variants of the load and store instructions, and a 
special ‘memory fence’ instruction for use in such routines. These instructions 
must respect the memory access ordering. The execution of the ordinary load 
and store instructions are not required to access memory in the order they were 
issued. The memory accesses may be reordered or even coalesced by the hard- 
ware, provided that the resulting access order satisfies read-after-write (RAW), 
write-after-write (WAW), and write-after-read (WAR) data dependencies [6]. 
The optimizations considered here use only the ordinary versions of the load 
and store instructions, and so the memory model presented does not address 
access ordering. Mike Gordon has described a more elaborate memory model 
that encompasses memory access ordering issues for the Alpha architecture [4]. 


3.2 General Purpose Registers 


The [A-64 architecture defines 128 general purpose registers. Each register holds 
a 64-bit word and a one-bit tag called a not a thing (nat) bit. The role of the 
nat bit is to indicate when the data held in the register is invalid due to a failed 
control speculation. These bits are set by failing control speculative loads, and 
are propagated by operations that use invalid data as input. We will describe 
the contents of a register with a record type:” 


_, def 
register = <jval: word: nat: boolt> 


These registers cannot necessarily all be accessed by the instructions of a 
particular routine. Each routine has access to a subset of the registers known 
as a frame. The current frame moves though the register file as subroutines are 
entered and exited. This is similar to the register window system of the SPARC 
architecture [9], except that IA-64 frames may be of variable size. Hardware 
automatically renames the registers so that the current frame appears at the 
start of the register file. Even though the optimizations described here do not 
involve subroutine calls, the description of the register frame mechanism can- 
not be completely ignored. Within a routine, attempts to write registers not in 
the current frame will cause a fault, while reading such registers will produce 
undefined results. 


? The actual formalization uses tuples as records are not supported in HOL Light. 
Records, in the style of hol98 [8], have been used here to simplify the presentation. 
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A new type grindex is defined for the set of general purpose register indexes, 
and we define a constant sof (size of frame) of that type to model the size of 
the current frame. The actual value of sof is unimportant, except that all frames 
must contain at least 32 registers.? 


grindex {n | n < 128} tf 32 < sof [sof _def] 


We define predicates reg_valid_source and reg_valid_dest to indicate which re- 
gisters may be read and written. 


if reg valid_source i = i < sof [reg_valid_source_def] 
2F reg valid dest i = reg_valid_source i A (i 4 0) [reg_valid_dest_def] 


The definition of the read and write operations for registers is straight for- 
ward; the only complications being due to the under-determined value of invalid 
reads and the hard-wired value of register 0. Note that the parameter zx, as be- 
fore, is used to allow the result of an undefined read to depend on some abstract 
notion of the general machine state. 


tf reg _valid_source i ==> {reg_read_def] 
reg_read x f i = if i = 0 then <ival: = 0; nat: = Fp elsef 2 


Ff reg write fiv= (Aj: if 7 =i then v else f 7) [reg_write_def] 


It was not necessary to under-specify the result of invalid writes, because [A-64 
instructions raise a fault rather than attempt this operation. 

The following basic theorems regarding register operations are necessary step- 
ping stones to verifying the optimizations. 


F reg_read x f 0 = <Wval: = 0; nat: = Fo [reg_read_zero_thm] 
+ reg_valid_dest 1 => reg_read x (reg_write fiv)i=v  [reg_read_eq_thm] 


F reg_valid_sourcei Ait # j => {reg_read_ne_thm] 
reg_read x (reg_write f j v) i = reg_read x f i 


+ reg_write (reg_write f iv) i w = reg_write fiw [reg_write_eq_thm] 


3.3 Predicate Registers 


The IA-64 includes 64 one-bit predicate registers. These registers can be used 
to mask execution of individual instructions. If the execution of an instruction 
3 The size of frame (sof) is defined as a constant because its value is not changed by 


the instructions used in the examples presented here. In general, however, its value 
can change and is better modeled as part of the state space defined in Sect. 3.5. 
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is conditional on the value of a predicate register, then the instruction is said 
to be predicated on that register. The description of the predicate registers is 
simpler than that of the general purpose registers as there is no notion of frame, 
the predicate registers are always visible, and their contents are always valid. 
Strictly speaking, (almost) all IA-64 instructions are predicated, those which 
are to be executed unconditionally are predicated on register 0, which is hard- 
wired to true. Writes to predicate register 0 are allowed, but they have no visible 
effect on the state. The operators on the predicate registers are pred_read and 
pred_write, and their definitions are similar to those for reg_read and reg_write. 


prindex af {n|{n< 64} 
tf pred_read f i = if i =O then T else fi [pred_read_def] 
$f pred_write f ib = (Aj-if i AOA 7 =i then b else f j) [pred_write_def] 
Properties similar to those proved about reg_read and reg_write hold for 
pred_read and pred_write as well. 
+ pred_read f0 =T |pred_read_O_thm] 


L i #0 ==> pred_read (pred_write f i b)i=b [pred_read_eq_thm] 


/ i # j ==> pred_read (pred_write f 7 b) 1 = pred_read f i [pred_read_nethm] 


+ pred_write (pred_write f 2 b) ic = pred_write fic [pred_write_eq_thm] 


3.4 The ALAT 


Data speculative, or advanced, load instructions must keep track of the integrity 
of data that has been loaded. Any subsequent stores overlapping the region 
loaded will invalidate the data. On the IA-64 this task is performed using an 
architectural feature called the Advanced Load Address Table (ALAT). 

Ideally, the ALAT records the following information for each speculatively 
loaded register: 


~ Whether the data in the register is valid. 
- Ifso, what region of memory the data was loaded from. 


The ALAT entry for each register can be described as a record as follows: 


def : ; 
alat_entry = <Wvalid: bool; addr: word; sz: size> 
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The information recorded in the ALAT does not have to be completely ac- 
curate to ensure the correct behavior of [A-64 programs. If the ALAT records 
a register as holding valid data, then that data must be valid; but the ALAT 
may falsely record that the data in a register is invalid. Such inaccuracy could 
cause suboptimal performance as it may force valid data to be reloaded, but the 
functional behavior of the program should be unaltered. There are many reasons 
why a particular implementation of the ALAT might exhibit such inaccuracy. For 
example, the ALAT may have fewer entries than there are registers. In order to 
capture the full generality of potential ALAT implementations, the specification 
presented gives only those properties that must be honored to guarantee correct 
program execution. This specification allows the ALAT to lose information in 
a variety of controlled ways. Indeed, an empty table would trivially satisfy the 
specification, though it would make for inefficient execution. 

An ALAT will be modeled as a function from register indices to ALAT entries. 
The simplest operation on the ALAT does nothing except allow the ALAT to 
forget about the validity of one or more registers. This operation is called leak. 


tef(—(t i).valid —> —(leak t i).valid) A [leak_def] 
((leak t 7).valid => leak t 2 =t 7) 


The first clause of the definition asserts that the leak operation will not cause an 
invalid register to become valid. The second clause states that any register still 
valid after the leak has the same ALAT entry it had before. 

A more constructive operation attempts to add information to the ALAT. 
The function validate ¢ 7 a s attempts to add to the ALAT t the fact that register 2 
contains valid data loaded from the region with address a and size s. It might 
not succeed, and it may cause the ALAT to forget about other registers. 


ef (=(t 7).valid ==> a(validate t ia s j).valid Vi = j) A [validate_def] 
((validate t 2 a s t).valid => 
validate ti a st = <valid: = T; addr: = a;sz:= s>) A 
(i #7 A (validate ti as j).valid => validatetiasj=t 7) 


The first clause of this definition asserts that validating a register 7 will not 
cause any other register to become valid. The second clause asserts that if after 
validating register 7, it is indeed valid, then i has associated with it the address 
and size supplied. The final clause states that if any other register is valid after 
the operation, the entry associated with it is unchanged. 

The expression invalidate_single t i represents invalidating an individual regi- 
ster 7 from an ALAT t. Using invalidate_single may also invalidate other registers. 


#F((t j).valid ==> —(invalidate_single t i j).valid) A _ [invalidate_single_def] 
((invalidate_single t 7 7).valid => 1 # j A invalidate_single ti 7 =t 7) 


The operation invalidate_multiple ¢ a s has the effect of invalidating in ALAT t 
all those registers loaded from regions that overlap the region with address a and 
size s. Other entries may also be invalidated as a result of this operation. 
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9*F (—(¢ i).valid => [invalidate_multiple_def] 
a(invalidate_multiple t a s i).valid) A 
((invalidate_multiple ¢ @ s 7).valid ==> 
woverlapped (t 7).addr (f 7).sz a s A invalidate_multiple t a s i = tt) 


Further ALAT Freedoms: An ALAT has the freedom to lie, in a conservative 
way, about the information it records. An ALAT may report that a register 
contains invalid data, even when it records that the data is valid. A subsequent 
query about the register may correctly answer that the data is valid. To model 
this behavior, we need another function to check the validity of a register. 


ef check x t i => (t 2).valid [check_def] 
The important features of check are as follows: 
— If check reports that a register is valid, then it really is valid. 


— The value returned by check depends, in an unspecified way, on a variable x 
that represents an abstraction of the entire machine state. 


3.5 The Machine State 


The whole machine is modeled as a record of the components described thus far: 


state <lip: num; — instruction pointer 
mem: word — word; — data memory 
grfile: grindex — register, — general purpose register file 
prfile: prindex — bool; ~— predicate register file 
alat: grindex — alat_entry; — advanced load address table 
unknown: ind> — other unknown state 


Two components of the state record were not previously alluded to. The in- 
struction pointer ip stores the location of the current instruction in a separate 
instruction memory. The unknown field represents an abstraction of the other 
aspects of the state of an JA-64 that are not modeled here. Its purpose is to 
serve as an argument to under-specified functions where the result may depend 
on things other than those components of the state that have been modeled con- 
cretely. The type ind is used for unknown because little is known about it, except 
that it is large enough to encode a representation of the complete machine state. 


4 JA-64 Instruction Semantics 


To verify the optimizations presented in Sect. 2 we need to model the effect of 
executing an IA-64 program until a particular point in the code is reached. We 
will model the outcome of doing this with a new type outcome. 
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def 
F'ld-apsrirgg = p: predicate 


let addr = reg_read o.unknown o.grfile r2 in 8: size of data to load 
let data = mem_read o.mem addr.val s in r1: destination register 
if spred_read o.prfile p then r2: register with source address 
STATE o 


: ; o: initial state 
else if -reg_valid_dest r; then 


FAULT ILLEGAL_OPERATION 
else if addr.nat then 
FAULT NAT_CONSUMPTION 
else if -mem_valid_source addr.val s then 
FAULT ILLEGAL_LOAD 
else if mem_seq addr.val s then 
STATE o with 
<dgrfile: = reg_write o.grfile ry <val: = 0; nat: = Fo; 
alat: = invalidate_single o.alat riD> 


else 
STATE o with 
<lgrfile: = reg_write o.grfile r1 <val: = data; nat: = Fp; 
alat: = validate o.alat r1 addr.val sp 
Fig. 3. Meaning of the advanced load instruction 
outcome “! STATE state — reaches nominated state 
| FAULT fault — faults before reaching nominated state 
f Ske — neither faults nor reaches nominated state 


The type fault describes IA-64 faults visible to applications programmers (i.e., 
page faults are not included). 


fault = NAT-CONSUMPTION | ILLEGAL_OPERATION... 


The meaning of each IA-64 instruction can be specified as a function from 
an initial state to an outcome. An example giving the definition of the advanced 
load instruction can be found in Fig. 3. The form of the definition is similar 
to that of the C pseudo code that defines this instruction in the architecture 
guide [6]. A type encompassing all IA-64 instructions can now be defined. 


inst = LD prindex size grindex grindex 
| LD_A prindex size grindex grindex 
| LD_S prindex size grindex grindex 
| CHK-A prindex grindex num 


Assuming that we have a complete set of instruction meanings in the style 
of Fig. 3, we can define a function mapping instructions to their meaning. 
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PLD p sry ra] =Idpsri red [inst_sem_def] 
[LDA psry re] =Id-apsry roA 


Some actions are common to all instructions and are therefore factored out. 
In particular, each instruction should advance the instruction pointer and change 
the unknown component of the state in some unspecified way. We define a fun- 
ction to return the unknown component of the next state, based on the current 
state o and the instruction 7. This function is completely unspecified. 


t¢fnext_unknown oi = 2 —> T [next_unknown_def] 


We can now define a function step to advance the execution of a program in an 
instruction memory p by one step. Should an instruction cause a fault, step will 
make no further progress. 


def step p (STATE oc) = [step_def] 
[p o.ip](o with dip: = o.ip + 1; 
unknown: = next_unknown o (po.ip)>) A 


step p (FAULT f) = FAULT f 


4.1 Execution Sequences 


The examples presented in Sect. 2 compare two programs by posing the question: 
Is the effect of one program when executed until it reaches some nominated 
instruction the same as that of another program when it is executed until it 
reaches a nominated instruction? To answer this question, we need to formalize 
what it means to execute a program until a nominated instruction is reached. 

The first thing to note is that some executions of a program will raise faults, 
and therefore never reach a particular target instruction. To be more precise then, 
we are interested in what it means to execute a program until some nominated 
instruction is reached or a fault is raised. If l is the location of the instruction we 
are interested in, then the predicate at_or_fault | describes those outcomes where 
we have reached our goal. 


SFat_or_fault 1 (STATE a) = (c.ip =1) A [at_or_fault_def] 
at_or_fault / (FAULT f) = T 


A program may contain loops, so during its execution it may execute the 
same instruction many times. When we talk about executing a particular pro- 
gram until a given instruction is reached, we are interested in the first time 
that instruction is reached. It simplifies the formalization to introduce a binder 
function that captures the concept of being ‘the first.’ We introduce the not- 
ation ‘€;n: P n’ to represent the first number for which P holds. For example, 
(Eyn-n > 10) is 11. 


ef (Gn. Pn) => [Ey def] 
P(Gn-Pn)A(Vm-m < (Qn Pn) => =P m) 
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Using €, we can define the expression (f until P) o to represent repeatedly 
applying the function f to o until some desired outcome, characterized by P, is 
reached. This expression yields | if the desired outcome can never be reached. 


Ff until P) o = if (An: P (f"0)) then Ff"? UY") oelse 1 —_[until_def] 


We can now phrase as follows the meaning of ‘executing the program in p until 
the instruction at | is reached.’ 


(step p) until (at_or_fault 2) 


4.2 Reasoning about until 


The following theorem allows us to reason about JA-64 programs using a form 
of symbolic simulation within the theorem prover. It allows us to take repeated 
steps in the program until we reach the desired outcome. 


- ((step p) until (at.or_fault 1)) (STATE a) = {until_step_thm] 
if o.ip = 1 then 
STATE o 
else 
((step p) until (at_or_fault 1)) (step p (STATE a)) A 
((step p) until (at_or_fault /)) (FAULT f) = FAULT f 


The proof of this theorem follows from a more general property of &. 


t aP OA (dn- Pn) => (Q\n- Pn) = (Gin-P (n+1))4+1 | [first_suc_thm] 


5 Equivalent Behavior 


We now need to consider what it means for two programs to be equivalent. It 
may be too strong a requirement to insist that the behavior of an optimized 
program be identical to that of the original code. For example, consider the 
two programs presented in Fig. 2. If the address in register r1 is not a valid 
destination, then both these programs will raise an ILLEGAL_STORE fault. Si- 
milarly, if the address in register r4 is not a valid source, then both will raise 
an ILLEGAL_LOAD fault. If both these conditions hold then the unoptimized 
code will raise an ILLEGAL-STORE fault and the optimized code will raise an 
ILLEGAL_LOAD fault. Nevertheless, we might still consider these programs to 
be equivalent. More precisely, we will consider the behavior of two programs to 
be equivalent when they both raise faults, without insisting that they raise the 
same fault. Equivalence of behavior will therefore be defined on outcomes rather 
than simply being defined on states. 

The programs shown in Fig. 1 are even more problematic. In the case where 
predicate register pi holds false then their behavior is the same, but when p1 
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holds true then the optimized code writes data to register r9 where the unop- 
timized code does not. This will be a problem unless r9 is a scratch register, 
the contents of which are not of ongoing interest. Assuming that is the case, 
our notion of equivalence needs to be broadened to encompass programs with 
identical behavior across a nominated set of interesting registers. 

We begin by defining an equivalence relation on register files that holds if 
some initial region of the register files are the same. 


eff, Yn fo) = [erfile.eqdef] 
n<sof A (Vix, ro-t <n => reg_read x, fy i = reg_read x2 fo 1) 


The following properties are important when reasoning about register files.4 


Reps [grfile.eq_refl_thm] 


FUai>n)= [grfile.eq_above_thm] 
((reg_write fi i v) Zn fa) = (fi =n f2) A 
(fe =n (reg_write fa tv)) = (fi =n f2) 


Having defined an equivalence relation on register files, we can now define 
one on execution outcomes. 


ef (STATE 01 2n STATE 02) = [outcome_eq_def] 
(o1.mem = oo.mem A o}.grfile =, o2.grfile A o1.prfile = o2.prfile) A 
STATE o =, FAULT f) =FA 


( 
( 
(FAULT f &, STATE oc) =FA 
(FAULT fi 2&n FAULT fo) = TA 
(FAULT f &, L)=FA 

(1 &, STATE oc) =FA 

(1 &, FAULT f) =FA 

(LY, J)=T 


6 Example Proof 


We can now return to a formal examination of the examples given in Sect. 2. We 
will consider only the example using data speculation, as its proof is the more 
challenging. We begin by specifying two instruction memories containing the 
original and optimized versions of the code from Fig. 2. Note that all instructions 
in these programs are unconditional, and hence predicated on register 0. 


#¢f original 1000 = ST0212A [original_def] 
original 1001 =LD0434 


4 The equivalence relation =, is also symmetric and transitive as expected, but these 
properties are not used in the proofs described here. 
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#*f optimized 2000 = LD A0434A [optimized _def] 
optimized 2001 = ST0212A 
optimized 2002 = CHK_A 0 3 4000 A 
optimized 4000 =LD0434A 
optimized 4001 = BR 0 2003 


The problem can now be stated as follows: 


(STATE 0; &s5 STATE a2) A 
reg_valid_source 1 A reg_valid_source 2 A reg_valid_source 4 A 
01.ip = 1000 A og.ip = 2000 => 
(step original) until (at_or_fault 1002) (STATE o1) =s 
(step optimized) until (at_or_fault 2003) (STATE o2) 


Note that equivalence between the executions can be proved only when the source 
registers are valid, as reading invalid registers returns unspecified results. Note 
also that the problem has been phrased using constant register names. We could 
also use variables to model symbolic register names, provided we add further 
assumptions asserting that the variables hold distinct values. 

To start the proof, we substitute concrete records for the states 01 and ao. 
The assumptions allow us to select. records with many common fields. We then 
separate out those assumptions that remain of interest, yielding the goal below: 

er, rs rT, 
e reg_valid_source 1 e reg_valid_source 2 e reg_valid_source 4 
(step original) until (at_or_fault 1002) 
(STATE <lip: = 1000; mem: = m; grfile: = r1; 
prfile: = p; alat: = a1; unknown: = 21>) =5 
(step optimized) until (at_or_fault 2003) 
(STATE <ip: = 2000; mem: = m; grfile: = ro; 
prfile: = p; alat: = ag; unknown: = r2>) 


The records in this goal describe the symbolic state for both programs before 
any instructions have executed. Since neither program has reached its target 
instruction, we can use the theorems until_step_thm, step_def and inst_sem_def 
(see Sect. 4) to progress the symbolic execution of both programs as follows: 


(step original) until (at_or_fault 1002) 
((st 0 2 1 2) (STATE <ip: = 1001; mem: = m; grfile: = rj; 
prfile: = p; alat: = ay; unknown: = r3>)) 5 
(step optimized) until (at_or_fault 2003) 
((Id_a 0 43 4) (STATE <ip: = 2001; mem: = m; grfile: = ra; 
prfile: = p; alat: = ag; unknown: = x4f>)) 


The values of the two unknown fields in the goal are actually expressions invol- 
ving the next_unknown, the instruction, and the previous state. Since these fields 
contain no useful information, it is clearer if we generalize the proof by replacing 
them with fresh variables, as shown above. 
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The next step is to expand the definitions of st and Id_a. The definition of 
ld_a was presented in Fig. 3. These, and other, IA-64 instructions are defined 
as a selection among possible outcomes. We can use case analysis to reduce the 
resulting complex goal, that compares two conditionally defined outcomes, to 
a collection of simpler goals in which outcomes are compared under different 
premises. This step generates twenty subgoals, of which the following is among 
the most interesting.® 


er, =s3 72 e reg_valid_source 1 
e reg_valid_source 2 e reg_valid_source 4 
e reg_valid_dest 3 e —(reg_read x3 rg 1).nat 
e —(reg_read 13 rg 2).nat e —(reg_read x4 Tr 4).nat 


emem-valid_dest (reg_read x3 r2 1).val 2 
@mem_valid_source (reg_read x4 r2 4).val 4 
e —(mem_seq (reg_read x4 72 4).val 4) 
(step original) until (at_or_fault 1002) 
(STATE <ip = 1001; 


mem: = 
mem_write m (reg_read x3 rg 1).val 2 (reg_read x3 re 2) 

grfile: = 13; 

prfile: = p; 


alat: = invalidate_multiple a; (reg_read x3 re 1).val 2; 
unknown: = x3>) “s 
(step optimized) until (at_or_fault 2003) 
(STATE <ip: = 2001; 


mem: =m 
grfile: = 
reg_write rg 3 <lval: = mem_read m (reg_read x4 r2 4).val 4; 
nat: = Fo; 
prfile: = p; 


alat: = validate az 3 (reg_read x4 ro 4).val 4; 
unknown: = z4>) 


Here the execution of both programs has progressed by one instruction, without 
encountering a fault. The goal has also accumulated a number of assumptions 
that will reduce the number of case splits needed for successive symbolic simula- 
tion steps. We repeat this process until each outcome in every goal is reduced to 
either FAULT or a STATE where the instruction pointer has reached the target. 
Each goal can then be reduced using outcome_eq_def (see Sect. 5). Because of the 
trivial equivalence of any two faulting outcomes, only four goals remain unsolved 
by this process. 

Of the four subgoals that remain after symbolic simulation, two can be di- 
scharged by conditional rewriting with theorems about reading and writing re- 
gisters and memory (see Sect. 3). This could be done as part of each symbolic 


5 The assumption r; £5 r2 has been used so that all reads refer to register file ro. 
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simulation step, but it is faster if done just once at the end. The two remaining 
goals capture the heart of the problem, they hinge on the behavior of the ALAT. 
In the first goal we see both programs have written data to register 3. The 
original program wrote the result of a read from memory, but the optimized 
program wrote the value zero. This must be the result of the advanced load 
having failed, causing a zero to be written, and the second load not having been 
performed. This should not happen, and indeed there is a contradiction in the 
assumptions. We have assumed that a check on register 3 in the ALAT succeeds, 
which is not possible since we have performed the operation invalidate_single on 
that register. This goal can be solved with the HOL Light model elimination 
procedure, MESON_TAC, using the ALAT definitions (see Sect. 3.4). 


e check 2’ 


(invalidate_multiple (invalidate_single a2 3) (reg_read x3 r 1).val 2) 3 
reg_write r; 3 <val: = mem_read ... (reg_read 13 rg 2) 4; nat: = Fo &5 


reg_write ro 3 <jval: = 0; nat: = Fp 


In the second case, both programs have loaded register 3 with four bytes 
read from memory at the address held in register 2. However, the loads have 
been performed on different memories. In the unoptimized code, the memory 
was first modified by writing two bytes to the address held in register 1. The 
values loaded to register 3 will be the same provided the memory regions read 
and written do not overlap. This fact is embodied in an assumption of the goal. 


e check 2” 
(invalidate_multiple (validate ag 3 (reg_read x4 r2 4).val 4) 
(reg_read x3 rg 1 ).val 2) 3 
reg_write r2 3 
<lval: = mem_read 
(mem-_write m (reg_read x3 r2 1).val 2 (reg_read x3 r2 2).val) 
(reg_read x4 rq 4).val 4; 
nat:= Fo &s 
reg_write rz 3 <Ival: = mem_read m (reg_read x4 rq 4).val 4; nat: = Fo 


The assumption shown states that register 3 was set valid and associated with 
the memory region that was read. An invalidate_multiple operation was then 
performed, invalidating all registers with data read from regions overlapping the 
region of memory that was written. A check of register 3 then asserts that it is 
still valid, from which we can deduce that the regions of memory read and written 
do not overlap. We can prove this lemma using MESON_TAC on the definitions of 
the ALAT operations. Once proved, we can use it and the definition of mem_write 
(see Sect. 3.1) to solve the goal. 

Both the examples presented in this paper were proved using HOL Light, as 
have other small examples using data speculation and transforming branching 
code into straight-line code using predication. All the proofs had the same form 
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as the one just presented, in which the majority of the proof is completed by 
symbolic simulation and rewriting. None required any more user interaction to 
complete than was needed for the proof just presented. 


7 Conclusion 


This paper described a formal model for a significant portion of Intel’s forthco- 
ming IA-64 architecture. Theorems were proved about the model that allowed a 
symbolic simulator to be built using the HOL Light theorem prover. This system 
can be used to largely automate simple optimization proofs for assembly-level 
IA-64 code. 

The scope of this research is intentionally limited. The problems considered 
are small, staying at the level of individual optimizing transformations rather 
than proofs about entire programs. Likewise the properties proved are modest, 
checking only for equivalence between two similar programs rather than attemp- 
ting to prove general correctness properties. By limiting the scope of the problem 
it was possible to find a solution that is largely automated. Indeed, the proofs 
could likely be more automated than they already are. The motivation for this 
approach comes from hardware verification where automated techniques with 
limited scope, like equivalence checking, have found industrial markets where 
more general interactive techniques have fared less well. 


8 Future Work 


One class of optimization not addressed by the work described here is software 
pipelining of loops. In these optimizations the original loop is transformed into 
a new loop where each cycle of the transformed loop executes instructions that 
correspond to steps within the execution of several successive iterations of the 
original loop. The transformation reduces data dependencies between instruc- 
tions within the loop, thereby hiding the latency of the slower instructions. The 
term ‘software pipelining’ derives from an analogy with hardware pipelining, 
where each cycle executes steps from several successive instructions. The IA-64 
includes several features that actively support the software pipelining of loops. 
We believe we can attack the problem of verifying transformations that pipeline 
a loop by building on the framework presented here using techniques analogous 
to those used to verify the equivalence of unpipelined and pipelined hardware 
implementations [1]. 


9 Related Work 


In this paper we have demonstrated a system for verifying optimizing transfor- 
mations to [A-64 assembly code. Perhaps the most closely related work is that of 
the Refinement Calculator project, which has built a general system to support 
program transformation and refinement in HOL [2,10]. The work here differs 
from that in the following ways: 
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Here we have worked with an unstructured assembly-level language, where as 
the Refinement Calculator (and similar transformation systems) manipulates 
structured programs. 

The work here has pursued a high degree of automation using symbolic 
simulation, where as systems like the Refinement Calculator usually focus 
on supporting a user-guided interactive style of reasoning. 
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Formal Verification of IA-64 Division Algorithms 


John Harrison 


Intel Corporation, EY2-03 
5200 NE Elam Young Parkway 
Hillsboro, OR 97124, USA 


Abstract. The IA-64 architecture defers floating point and integer di- 
vision to software. To ensure correctness and maximum efficiency, Intel 
provides a number of recommended algorithms which can be called as 
subroutines or inlined by compilers and assembly language programmers. 
All these algorithms have been subjected to formal verification using the 
HOL Light theorem prover. As well as improving our level of confidence 
in the algorithms, the formal verification process has led to a better un- 
derstanding of the underlying theory, allowing some significant efficiency 
improvements. 


1 Introduction 


IA-64 is a new 64-bit computer architecture jointly developed by Hewlett-Packard 
and Intel, and the Intel Itanium™ processor is its first silicon implementation. 
We will summarize below the details of the IA-64 instruction set architecture 
(ISA) necessary for the present paper. A more complete description may be fo- 
und in the IA-64 Application Developer’s Architecture Guide, available from 
Intel in printed form and online.! 

To avoid some of the limitations of traditional architectures, [A-64 incorpora- 
tes a unique combination of features, including an instruction format encoding 
parallelism explicitly, instruction predication, and speculative/advanced loads 
[4]. Nevertheless, it also offers full upwards-compatibility with IA-32 (x86) code. 


1.1 The IA-64 Floating Point Architecture 


The IA-64 floating point architecture has been carefully designed to allow high 
performance. Features include multiple floating-point status fields and special 
instructions for transferring data between integer and floating point registers. 
The centerpiece of the architecture is the fma (floating point multiply-add or 
fused multiply-accumulate) instruction. This computes ry + z from inputs z, 
y and z with a single rounding error. Except for subtleties over signed zeros, 
floating point addition and multiplication are just degenerate cases of fma, ly+z 
and ry +0, so do not need separate instructions. Variants of the fma switch signs 
of operands: fms computes ry — z while fnma computes z — zy. 


' See http: //developer. intel .com/design/ia64/downloads/adag . htm. 
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The IA-64 architecture supports several different floating point formats com- 
patible with the IEEE 754 Standard for Binary Floating-Point Arithmetic [10]. 
For the four most important formats, we give the conventional name, the pre- 
cision, and the minimum and maximum exponents. Thus, numbers in a format 
with precision p and minimum and maximum exponent Emin and Emaz are 
those representable as: 


+do. dydod3 pear dp-1 x 2° 
with the d; € {0,1} and Enin <e < Emaz- 


The single and double formats are mandated and completely specified in the 
Standard. The double-extended format (we will often just call it ‘extended’) is 
recommended and only partially specified by the Standard. The register format 
has the same precision as extended, but allows greater exponent range, helping 
to avoid overflows and underflows in intermediate calculations. As well as these 
“scalar” formats, [A-64 features a SIMD format where two single-precision num- 
bers are packed in a floating point register and the pair operated on in parallel. 
Numerically, this amounts to just two parallel copies of the single-precision for- 
mat, but pragmatically it places different demands on the programmer since one 
can no longer use higher intermediate precision or range while maintaining the 
additional level of parallelism. 

Most operations, including the fma, take arguments and return results in 
some of the 128 floating point registers provided for by IA-64, in which floating 
point numbers from all formats map onto a standard bit encoding. By a com- 
bination of settings in the multiple status fields and completers on instructions, 
the results of operations can be rounded in any of the four IEEE rounding mo- 
des (to nearest, towards positive or negative infinity, and towards zero) and into 
any of the supported floating point formats, whatever format the operands come 
from. 


1.2 Division in Software 


In most current computer architectures, in particular the Intel [A-32 (x86) ar- 
chitecture currently represented by the Pentium® III processor, instructions are 
specified for the floating point and integer division operations. In IA-64, the 
only instruction specifically intended to support division is the floating point 
reciprocal approximation instruction, frcpa. This merely provides an approxi- 
mate reciprocal which software can use to generate a correctly rounded quotient. 
There are several reasons for relegating division to software. 
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— By implementing division in software it immediately inherits the high de- 
gree of pipelining in the basic fma operations. Even though these operations 
take several clock cycles, new ones can be started each cycle while others 
are in progress. Hence, many division operations can proceed in parallel, 
leading to much higher throughput than is the case with typical hardware 
implementations. 

— Greater flexibility is afforded because alternative algorithms can be substitu- 
ted where it is advantageous. It is often the case that in a particular context 
a faster algorithm suffices, e.g. because the ambient [IEEE rounding mode is 
known at compile-time, or even because only a moderately accurate result 
is required (e.g. in some graphics applications). 

~— In typical applications, division is not an extremely frequent operation, and 
so it may be that die area on the chip would be better devoted to something 
else. However it is not so infrequent that a grossly inefficient software solution 
is acceptable, so the rest of the architecture needs to be designed to allow 
reasonably fast software implementations. 


1.3. Formal Floating Point Theory 


The formal verifications are conducted using the freely available? HOL Light 
prover [7]. HOL Light is a version of HOL [5], itself a descendent of Edinburgh 
LCF [6] which first defined the ‘LCF approach’ that these systems take to formal 
proof. LCF provers explicitly generate proofs in terms of extremely low-level 
primitive inferences, in order to provide a high level of assurance that the proofs 
are valid. In HOL Light, as in most other LCF-style provers, the proofs (which 
can be very large) are not usually stored permanently, but the strict reduction to 
primitive inferences in maintained by the abstract type system of the interaction 
and implementation language, which for HOL Light is CAML Light [16,3]. This 
language serves as a programming medium allowing higher-level derived rules 
(e.g. to automate linear arithmetic, first order logic or reasoning in other special 
domains) to be programmed as reductions to primitive inferences, so that proofs 
can be partially automated. In general, however, the user must describe the proof 
at a moderate level of detail. 

The verifications described here draw extensively on a formalized theory of 
real analysis [8] and floating point arithmetic [9]. These sources should be con- 
sulted for more details, but we now summarize some of the main formal concepts 
used in the present paper. 

HOL notation is generally close to traditional logical and mathematical nota- 
tion. However, the type system distinguishes natural numbers and real numbers, 
and maps between them by &; hence &2 is the real number 2. The multiplicative 
inverse «—! is written inv(x), the absolute value |z| as abs(x) and the power 
x” as X pow n. 

Much of the theory of floating point numbers is generic. Formats are identified 
by triples of natural numbers fmt and the corresponding set of representable real 


? See http://www.cl.cam.ac.uk/users/jrh/hol-light/index.html. 
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numbers, ignoring the upper limit on the exponent range, is iformat fmt. The 
second field of the triple, extracted by the function precision, is the precision, 
i.e. the number of significand bits. The third field, extracted by the ulpscale 
function, is N where 2—™ is the smallest nonzero floating point number of the 
format. 

Floating-point rounding is performed by round fmt rc x which denotes the 
result of rounding the real number x into a floating point format fmt under 
rounding mode rc, neglecting the upper limit on exponent range. The predicate 
normalizes determines whether a real number is within the range of normal 
floating point numbers in a particular format, i.e. those representable with a 
leading 1 in the significand, while losing determines whether a real number will 
lose precision, i.e. underflow, when rounded to a given format. 

An important concept in floating point arithmetic is a unit in the last place 
or ulp. Though widely used by floating point experts, there are a number of 
divergent definitions and care is needed in the formalization [9]. To understand 
the present paper, the following is adequate: if x is any real number and fmt 
identifies a floating point format, then ulp fmt x (‘an ulp in x with respect to 
floating point format fmt’) is the distance between the two closest floating point 
numbers straddling x. 

The canonical sign, exponent and significand fields for a representable real 
number are extracted by functions decode_sign, decode_exponent and decode 
fraction. Actual floating-point register bitstrings are distinguished from the 
real numbers they represent, and the mapping from bitstrings to reals is perfor- 
med by a function Val. Whether a floating point number is normal is determined 
by a predicate normal. 


2 Perfect Rounding 


The IEEE Standard for Binary Floating-Point arithmetic [10] specifies that the 
result of division (as with other basic algebraic operations such as addition, 
multiplication and square root) should be as if the ideal mathematical result 
were calculated exactly then rounded in the appropriate rounding mode. Later 
we examine in detail how to make sure of this for division, but first some general 
discussion of perfect rounding and the related HOL proofs seems appropriate. 
Suppose x is the exact result of the operation, e.g. a/b in the case of division, 
and the calculated answer is z. Whatever the implementation, z will result from 
rounding an ideal mathematical answer, say y, to some operation. Anticipating 
later examples, suppose the final step of a division algorithm computes the final 
quotient from three arguments q, r and y by means of the fma operation: 


fma.pe.sf q=T3, y3, 43 


Because the fma itself conforms to (the obvious extrapolation of) the IEEE 
Standard, the result g arises from rounding the exact mathematical value qg* = 
r3Yy3 + q3 in the intended rounding mode. We need to ensure that whatever the 
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rounding mode, g* and the exact quotient a/b round to the same floating point 
number. 


2.1 Sufficient Conditions for Perfect Rounding 


In the following diagram the longer markings denote floating point numbers and 
the shorter ones the midpoints between floating point numbers. Assuming we 
are in round-to-nearest mode, ¢ will round to the number below it, but q* to 
the number above it. 


ole 
&Q 
* 


A little reflection shows that in order to ensure perfect rounding in the round- 
to-nearest mode, a sufficient condition is that g* and a/b are never separated 
by a midpoint, for which in turn it suffices that for any midpoint m we have 
|a/b ~ g*| < |a/b — mj. Quite generally, we can prove in HOL the following 
theorem: 


+ (precision fmt = 0) A 
(Vm. m € midpoints fmt => abs(x - y) < abs(x - m)) 
= > (round fmt Nearest x = round fmt Nearest y) 


Obviously this precondition cannot be satisfied if a/b is exactly a midpoint. 
However it is easy to prove that this cannot occur provided the quotient is in the 
normal range: 


+ a € iformat fmt A b € iformat fmt A 
~(b = &O) A normalizes fmt (a / b) 
=> -(a / b € midpoints fmt) 


For other rounding modes, an analogous property is required for floating 
point numbers rather than midpoints. To ensure correctness for all rounding 
modes, the following suffices. 


- (precision fmt = 0) A 
(Va. a € iformat(exprange fmt,precision fmt + 1,ulpscale fmt + 1) 


==> abs(x - y) < abs(x - a)) 
==> (round fmt rc x = round fmt re y) 


Note that we state the theorem in terms of a floating point format with one 
extra bit of precision, which is exactly the floating point numbers plus midpoints: 


F -7A(p = 0) 
=> (midpoints(E,p,N) U iformat(E,p,N) = iformat(E,p+1,N+1)) | 
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Since it is possible for the quotient to be exactly a floating point number, or 
the midpoint between denormal numbers (e.g. 1.11---11 x 2/2), we need to 
deal with these special cases separately. As we shall see, these work automatically 
for the algorithms as they are structured here. 


2.2 Flag Settings 


We must ensure not only correct results in all rounding modes, but that the flags 
are set correctly. However, this essentially follows in general from the correctness 
of the result in all rounding modes (strictly, in the case of underflow, we need 
to verify this for a format with slightly larger exponent range). For the correct 
setting of the inexact flag, we need only prove the following HOL theorem: 


-F -x(precision fmt = 0) A 
(Vre. round fmt rc x = round fmt re y) 
=> Vre. (round fmt re x = x) = (round fmt re y = y) 


The proof is simple: if x rounds to itself, then it must be representable. But 
by hypothesis, y rounds to the same thing, that is x, in all rounding modes. In 
particular the roundings up and down imply x <= y and x >= y,soy = x. The 
other way round is similar. 


2.3 Exclusion Zones 


The theorems above show that provided g* and a/b are closer to each other than 
a/b is to a floating point number or midpoint, correct rounding is assured. One 
approach to proving this for a given algorithm is to ask: how close can a/b be toa 
floating point number or midpoint? A little work allows us to provide an answer 
to that question [2], which we can formalize as the following HOL theorem: 


Ff a € iformat(E,p,N) A 
b € iformat(E,p,N) A 
c € iformat(E,p+i,N+1) A 


&2 pow (p - 1) / &2 pow N <= abs(a) A 
a(b = &0) 
=> (a/be=c) V 
abs(a / b - c) >= abs(a / b) / &2 pow (2 * p + 2) 


It can be read as saying that every floating point number or midpoint c is 
surrounded by an ‘exclusion zone’ of size approximately wit within which no 
floating point quotient can lie. This implies that if a/b is not exactly a floating 


point number, then having: 


a/b) 


lq = a/d| < 92p+2 


would suffice for perfect rounding. By using higher intermediate precision to- 
gether with the benefit of the fma, this kind of relative error can be achieved 
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without trouble, and some of the Intel division algorithms can be verified using 
the above property. However, in the case of extended precision or SIMD opera- 
tion, we have no higher intermediate precision available. Then even the fma does 
not quite allow us to guarantee getting g* that close to a/b in a straightforward 
way, and we must prove more precise theorems, which we discuss below. 

A refinement of the ‘exclusion zone’ approach is not only to identify the width 
of the exclusion zone but to isolate the inputs a and b where the quotients lie 
closest to floating point numbers or midpoints. Then one can get away with a 
worse error bound provided those special cases also work correctly, which one 
can verify by explicitly running through the algorithm. For the square root, 
this approach works well [2], and one can feasibly isolate a moderate number 
of ‘difficult cases’, allowing a uniform and effective way of verifying square root 
algorithms (which we have used in analogous verifications for square root). For 
division, there are too many solutions for a and b for this to be a feasible approach 
for verification. However, once either a or b is fixed — for example in the special 
case of finding reciprocals — the number of solutions is typically quite moderate. 


3 Implementing Division on IA-64 


The general form of the I[A-64 assembly language frcpa instruction is: 


frcepa.sf q, p=a, b | 


where qg, a and 6 are floating point registers, p is a predicate register, and sf 
is a floating-point status field. Essentially, @ and b are the dividend and divisor 
respectively, and q is the destination register for the result. The status field sf 
controls the behavior in exceptional cases, e.g. division by zero, and the predicate 
register p is set to false if the inputs were exceptional, e.g. if @ or b was zero. 
In the exceptional cases, gq is set to the IEEE-correct quotient, either directly 
by the hardware or via a SWA (software assistance) trap, and no further action 
is necessary. Otherwise p is set to true and gq is set to an approximation of 1/b 
with a guaranteed relative error: 


|g —1/d| < 2-**°|1/b| 


(In fact, the ISA specifies the details of the approximation more precisely, so 
that the particular value, which by the way has at most 11 significant bits, is 
predictable on all IA-64 processors.) Software is then expected to use this to 
arrive at the IEEE-correct quotient, i.e. the result that would be obtained if the 
quotient were calculated exactly then rounded using the ambient IEEE rounding 
mode. Moreover, the six IEEE flags must be set correctly, e.g. the inexact flag 
is set if and only if the quotient is not exactly a floating point number. 


3.1 Intel-Provided Algorithms 


It is not immediately obvious that without tricky and time-consuming bit- 
twiddling, it is possible to produce an IEEE-correct result and set all the IEEE 
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flags correctly via ordinary software. Remarkably, however, fairly short straight- 
line sequences of fma operations (or negated variants), suffice to do so. This 
approach to division was pioneered by Markstein {11] on the IBM RS/6000? fa- 
mily. It seems that the ability to perform both a multiply and an add or subtract 
without an intermediate rounding is essential to this, but besides its utility here, 
the fma has many other benefits in improving floating point performance and 
accuracy. 

Intel provides a number of recommended division and square root algorithms, 
in the form of short sequences of straight-line code written in I[A-64 assembly 
language. The intention is that these can be inlined by compilers, used as the 
core of mathematical libraries, or called on as macros by assembly language 
programmers. The algorithms are available for download from: 


http: //developer .intel.com/software/opensource/numerics.htm 


All the Intel-provided algorithms have been carefully designed to provide 
IEEE-correct results and trigger IEEE flags and exceptions appropriately. Sub- 
ject to this correctness constraint, they have been written to maximize perfor- 
mance on the Itanium™ processor. However, they are also likely to be the most 
appropriate algorithms for future IA-64 processors, even those with significantly 
different hardware characteristics. 

Separate algorithms are provided for the main IA-64 floating point formats 
(single, double, extended and SIMD), since faster algorithms are usually possible 
when the required precision is lower. As well as the multiplicity of formats, most 
algorithms have two separate variants, one of which is designed to minimize 
latency (i.e. the number of clock cycles between starting the operation and having 
the result available), and the other to maximize throughput (the number of 
operations executed per cycle, averaged over a large number of independent 
instances). Which variant is best to use depends on the kind of program within 
which it is being invoked. 


3.2 Refining Approximations 


First we will describe in general terms how we can use fma operations to re- 
fine an initial reciprocal approximation towards a better reciprocal or quotient 
approximation. For clarity of exposition, we will ignore rounding errors at this 
stage, and later show how they are taken account of in the formal proof. In the 
next subsection we cover the subtler issue of guaranteeing correct rounding. 

Consider determining the reciprocal of some floating point value b. Starting 
with a reciprocal approximation y with a relative error e€: 


1 
= —-(l+e 
y=;(l te) 
we can perform just one fnma operation: 


3 All other trademarks are the property of their respective owners. 
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e=1-—by 
and get: 
e=1-—by 
=1 b-G 
= 5 €) 
=1-(1+e) 
= —-€ 
Now observe that: 
pics oy 
b (1+€) 


=y(l—e +e? — 2 +---) 
=yltete +e? +-.) 
This suggests that we might improve our reciprocal approximation by multi- 


plying y by some truncation of the series 1 +e + e? +e? +---. The simplest case 
using a linear polynomial in e can be done with just one more fma operation: 


y =ytey 


Now we have 


y' =y(1+e) 
= (1 +e)(1+e) 


. 
= ra +e)(1—€) 
=; (1-2) 


The magnitude of the relative error has thus been squared, or looked at 
another way, the number of significant bits has been approximately doubled. 
This, in fact, is exactly a step of the traditional Newton-Raphson iteration for 
reciprocals. In order to get a still better approximation, one can either use a 
longer polynomial in e, or repeat the Newton-Raphson linear correction several 
times. Mathematically speaking, repeating Newton-Raphson iteration n times is 
equivalent to using a polynomial 1 +e+---+e2"—!, e.g. since e’ = €? = e”, two 
iterations yield: 


y” =y(l+e)(1 +e?) =y(lt+e+e? +e) 
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However, whether repeated Newton iteration or a more direct power series 
evaluation is better depends on a careful analysis of efficiency and the impact of 
rounding error. The Intel algorithms use both, as appropriate. 

Now consider refining an approximation to the quotient with relative error 
€; we can get such an approximation in the first case by simply multiplying a 
reciprocal approximation y = : by a. One approach is simply to refine y as 
much as possible and then multiply. However, this kind of approach can never 
guarantee getting the last bit right; instead we also need to consider how to 
refine g directly. Suppose 

q= Fl +6) 


We can similarly arrive at a remainder term by an fnma: 


r=a—bq 
when we have: 

r=a-— bq 

bo (1 +6) 
=a— 0-— 

b 
=a-—a(l+e) 
= —ae 


In order to use this remainder term to improve q, we also need a reciprocal 
approximation y = #(1 +1). Now the fma operation: 


q=aqtry 


results in, ignoring the final rounding: 


q=qtry 
a 1 
7 a + €) — aes (1 +7) 
= F(1+e-e(1 +n) 
a 
= (1 — en) 


3.3. Obtaining the Final Result 


While we have neglected rounding errors hitherto, it is fairly straightforward to 
place a sensible bound on their effect. To be precise, the error from rounding is 
at. most half an ulp in round-to-nearest mode and a full ulp in the other modes. 
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tf “(precision fmt = 0) 
==> (abs(error fmt Nearest x) <= ulp fmt x / &2) A 


(abs(error fmt Down x) < ulp fmt x) A 
(abs(error fmt Up x) < ulp fmt x) A 
(abs(error fmt Zero x) < ulp fmt x) 


where 


- error fmt rc x = round fmt rc x - x 


It turn, we can easily get fairly tight lower and upper bounds on an ulp in x 
in terms of the magnitude of z, the upper bound assuming normalization: 


F abs(x) / &2 pow (precision fmt) <= ulp fmt x 


and 


- normalizes fmt x A “(precision fmt = 0) A ~(x = &0) 
==> ulp fmt x <= abs(x) / &2 pow (precision fmt - 1) 


Putting these together, we can easily prove simple relative error bounds on 
all the basic operations, which can be propagated through multiple calculations 
by simple algebra. It is easy to see that while the relative errors in the approxi- 
mations are significantly above 2~” (where p is the precision of the floating point 
format), the effects of rounding error on the overall error are minor. However, 
once we get close to having a perfectly rounded result, rounding error becomes 
highly significant. How the algorithm is designed and verified now depends ra- 
dically on whether we have higher precision available. If we do, then we can 
usually rely on a simple ‘exclusion zone’ proof. Otherwise, we need more precise 
theorems, the central one being the following due to Markstein [11]: 


Theorem 1. [fq is a floating point number within 1 ulp of the true quotient a/b 
of two floating point numbers, and y is the correctly rounded-to-nearest approxt- 
mation of the exact reciprocal 1, then the following two floating point operations: 


r=a-~bq 
q=qtry 


using round-to-nearest in each case, yield the correctly rounded-to-nearest quoti- 
ent q’. 


This is not too difficult to prove in HOL. First we observe that because 
the initial g is a good approximation, the computation of r cancels so much 
that no rounding error is committed. (This is intuitively plausible and stated by 
Markstein without proof, but the formal proof was surprisingly messy.) 
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k 2 <= precision fmt A 
a € iformat fmt A b € iformat fmt A q € iformat fmt A 
normalizes fmt q A abs(a / b - q) <= ulp fmt (a / b) A 
&2 pow (2 * precision fmt - 1) / &2 pow (ulpscale fmt) <= abs(a) 
= > (a - b * q) € iformat fmt 


Now the overall proof given by Markstein is quite easily formalized. However, 
we observed that the property actually used in the proof is in general somewhat 
weaker than requiring y to be a perfectly rounded reciprocal. The theorem ac- 
tually proved in HOL is: 


Theorem 2. If q is a floating point number within 1 ulp of the true quotient 
a/b of two floating point numbers, and y approximates the exact reciprocal i 
to a relative error < mae where p is the precision of the floating point format 
concerned, then the following two floating point operations: 


r=a—bq 
q=qtry 


using round-to-nearest in each case, yield the correctly rounded-to-nearest quoti- 
ent q’. 


The formal HOL statement is as follows: 


- 2 <= precision fmt A 
a € iformat fmt A b € iformat fmt A 
q € iformat fmt A r € iformat fmt A 
a(b = &0) A 
a(a / b € iformat fmt) A 
normalizes fmt (a / b) A 
abs(a / b - q) <= ulp fmt (a / bd) A 
abs(inv(b) - y) < abs(inv b) / &2 pow (precision fmt) A 
(r=a-b*q)A 
(qv =qerty) 


=> (round fmt Nearest q’ = round fmt Nearest (a / b)) 


Although in the worst case, the preconditions of the original and modified 
theorem hardly differ (recall that |x|/2? < ulp(x) < |x|/2?~'), it turns out 
that in many situations the relative error condition is much easier to satisfy. In 
Markstein’s original methodology, one needs first to obtain a perfectly rounded 
reciprocal, which he proves can be done as follows: 


Theorem 3. [fy is a floating point number within 1 ulp of the true reciprocal 
a then one iteration of: 


e=1-—by 
y =ytey 
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using round-to-nearest in both cases, yields the correctly rounded reciprocal, ex- 
cept possibly when the mantissa of b consists entirely of 1s. 


If we rely on this theorem, we need a very good approximation to 7 before 
these two further serial operations and one more to get the final quotient using 
the new y’. However, with the weaker requirement on y’, we can get away with 
a correspondingly weaker y. In fact, we prove: 


Theorem 4. [fy is a floating point number that results from rounding a value 


9, and the relative error in yo w.r.t. + is < s& for some natural number d 
Yo, b 32p 


(assumed < 2?~?), then y will have relative error < + w.r.t. +, except possibly 
if the mantissa of b is one of the d largest. (That is, when scaled up to an integer 


2P-1 < my < 2”, we have in fact 2? —d < my < 2?.) 


Proof. For simplicity we assume b > 0, since the general case can be deduced by 
symmetry from this. We can therefore write b = 2°m, for some integer mp with 
QP-1 < my < 2P. In fact, it is convenient to assume that 2?—! < my, since when 
b is an exact power of 2 the main result follows easily from d < 2?—-?. Now we 
have: 


1 1 
Ore 
b Mb 
2p-1 
= gn let2p- (2 
™b 


and ulp(¢) = 2-(e+2p-1)_ In order to ensure that |y — tl < [#1 /2? it suffices, 
since |y — yo| < ulp(#)/2, to have: 
1 1 1 
fea 2) /9P _ = 
Ivo ~ 51 < (F)/2? ~ ulp(5)/2 
= (F)/2? = pole rep yp 
= (5)/2? — (Z)ma/2? 
b b 
By hypothesis, we have |yo — i| < (t) sm. So it is sufficient if: 
1 1 1 
—\d 2p = P_ f= 2p 
(Z)4/2? < (7)/2? — (mo /2 
Canceling (¢)/2? from both sides, we find that this is equivalent to: 


d<2?—m, 


Consequently, the required relative error is guaranteed except possibly when d > 
2? — mp, or equivalently mp > 2? — d, as claimed. 
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The HOL statement is as follows. Note that it uses e = d/2?? as compared 
with the statement we gave above, but this is inconsequential. 


F 2 <= precision fmt A 
b € iformat fmt A 
y € iformat fmt A 
a(b = &0) A 
normalizes fmt b A 
normalizes fmt (inv(b)) A 
(y = round fmt Nearest yO) A 
abs(yO - inv(b)) <= e * abs(inv(b)) A 
e <= inv(&2 pow (precision fmt + 2)) A 
&(decode_fraction fmt b) < 
&2 pow (precision fmt) ~ &2 pow (2 * precision fmt) * e 
=> abs(inv(b) - y) < abs(inv(b)) / &2 pow (precision fmt) 


4 HOL Algorithm Verifications 


We will now give two examples of actual [A-64 division algorithms and describe 
their HOL verification. Both algorithms are for single precision arithmetic, but 
one is a scalar algorithm that uses higher precision internally, and the other is a 
SIMD algorithm that uses only single precision operations. The two verifications 
thus present interesting contrasts. 


4.1 Scalar Single Precision Algorithm 


The following algorithm is for single precision computation, but makes clever use 
of the availability of higher intermediate precision. The steps of the algorithm 
are grouped into six stages which may be executed in parallel if the particular 
IA-64 machine allows this, as the Itanium™ processor does. The last column 
indicates the floating point format into which the result of that operation is 
rounded. Note that in all the algorithms we consider, all steps but the last are 
done in round-to-nearest mode, and the last in the ambient rounding mode. 


l.yo = (1 +6) frepa 


2.e€9 =1—byo Register 
go = aYo Register 
3. 41 = Go + €0go Register 
€1 = €9€y Register 


4.q2 = q1 + €19q1 Register 
€2 = €1€1 Register 
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5. g3 = ga + €292 Registerdouble 
6.q¢ = 43 Single 


The algorithm forms an initial reciprocal approximation yo and a quotient 
approximation go, then refines them both by two stages of Newton-Raphson 
iteration. The subtlety is in the last two lines, where q3 is rounded to ‘register 
double’ (double precision but with a wider exponent range) and subsequently 
rounded again to single precision, in order to obtain a perfectly rounded result. 
We now turn to the formal verification. 

As detailed in [9], we have written derived rules that can automatically pro- 
pagate forward known upper and lower ranggs on the size of arguments to the 
result of fma-type operations, automatically verifying that the result neither 
overflows nor loses precision and hence that we can express the result as a rela- 
tive perturbation of the exact result. HOL’s programmability is vital here; these 
proofs would be extraordinarily tedious to orchestrate by hand. 

We do this for all steps of the algorithm, though we then have to reexamine 
some of them more precisely to make the proof work. Results of later lines have 
accumulated many errors from previous ones, and again we use an automatic 
HOL rule to bound these. The bounds derived automatically in this way are 
naive. For example, if we know |e| < 2~*4, the automatic rule can deduce that 
yo(1 + €)(1 —€) = yo(1 + €’) with |e’| < 2724 + 2-74 4 2-742-4. Of course, with 
a little intelligence, a human can derive |e’/| < 274%. This kind of intelligence 
has to be injected sometimes, but generally, the automated process is enough 
to do the donkey work of keeping track of the dozens (hundreds in some other 
verifications) of ultimately negligible error terms. The first important relative 
error is in g3 before rounding, ie. gj = q2 + €2q2. We find that q3 = ¢(1 +e) 
with |e| < 197509/2°°. 

Now we distinguish two cases according to whether a/b is actually represen- 
table in the ‘register single’ format. (The use of register single rather than single 
simplifies the later argument, which is otherwise complicated by the possibility 
that a/b could be exactly the midpoint between two denormal numbers.) 

If a/b is in the register single format, then it is a fortiori in the register double 
format. Since qj = ¢(1+e) with |e| < 2~®?, it is clear that q3 = a/b exactly, and 
so q is certainly the IEEE correct answer since it literally results from rounding 
a/b to single precision. 

If a/b is not in the register single format, then we still have a respectable 
relative error for g3 after rounding because rounding was into a format with 


more than twice single precision. In fact, we have qg = ¢(1 + e) with |e] < 
9-52 


, and examining the exclusion zone theorem, we need only |e| < 2~(?*?4+?), 
Consequently, correctness is proved. 


4.2 SIMD Single Precision Algorithm 


The following algorithm is for SIMD single precision computation. It can also be 
grouped in 6 parallel stages, though on a machine capable of issuing fewer than 
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3 floating point operations per cycle, some instructions may need to be offset by 
a cycle. 


l.yo = ¢(1+e) frepa 


2.d=1-—byo Single 
qo = ayo Single 


3. 41 = Yo +dyo Single 
ro =a—bq Single 


4.e=1-—by, Single 
y2 = yo + dy; Single 
9 = qo + roy: Single 


5. y3 = y1 t+ eya Single 
Tr; =a—bq, Single 


6.¢=q1+711y3 Single 


Once again we can use the automated tools to produce simple relative error 
bounds for the intermediate stages. In this case, however, more human interven- 
tion in the proofs is necessary, since for extreme inputs the intermediate steps, 
which have no additional exponent range, can overflow or underflow. However, 
the parallel version of frcpa indicates this possibility by clearing a predicate re- 
gister, triggering the use of a different algorithm. We simply need to verify that 
the condition tested ensures that no overflow or underflow occurs here, which is 
easily done. 

First, suppose that a/b is exactly, or is very close to, a single precision floating 
point number ec. In this case, the semi-automatic error analysis indicates that 
dt = qotroy1 = ¢(1+€) with |e| < 2~?5°, close enough to ensure that q, = c. As 
before, this ensures that the exact cases work correctly, and allows us to dispose 
also of the directed rounding mode cases, since these are the only problematic 
ones for a simple exclusion zone proof. For the more difficult case of round- 
to-nearest and where the quotient is not close to a floating point number, the 
critical relative error result is for y3 before rounding, which is indicated in the 
HOL goal by the following derived assumptions: 


(‘Val e * Val y_2 + Val y_1 = inv(Val b) * (&1 + e16)‘] 
[‘abs e16 <= &657 / &2 pow 50°) 


In other words, yf = %(1 + e16) with Jere] < 657/2°°. Since 657/2°° < 
165/2?*?4, we can now apply Theorem 4 to show that y3 will satisfy the relative 
error criterion needed for Theorem 2, except possibly when the mantissa of 
b is one of the 165 largest. For these cases, HOL is programmed to evaluate 
the result of the y3 computation on them explicitly (dealing with the arbitrary 
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exponent scaling is the only slight difficulty), and it automatically confirms that 
the criterion is always attained in these cases too. (Note that if this fails, we 
may still be able to show the overall quotient result will be correct, but it needs 
somewhat more work and has never arisen in practice so far.) Consequently, we 
can now apply Theorem 2 and deduce that the final result is correctly rounded 
and all flags set (subject to the criterion identified for the intermediate results 
not to overflow or underflow, which matches the cases indicated by the parallel 
frcpa). 

A more complicated analysis (which has not been formalized in HOL) sug- 
gests that while ys always satisfies the relative error criterion, it fails to be 
perfectly rounded for precisely 12 of the possible 274 input b significands. Conse- 
quently, this algorithm could not be justified based only on Markstein’s theorems 
in their original form. 

Another situation where the new theorems allow us to justify faster algo- 
rithms is extended precision division. Using Markstein’s original theorems, it 
seems the best that can be achieved is the following: 


l.yo=¢(l+e) [frepa] 

2. €9 = 1 — byo go = 2Yo 

3.41 =yoteoyo e1 =e 
4.yo=yitey1 To=a— bao 
5. €3 > 1- bye 

6. y3 = yo + e2ye 

7. €3 =1— by3 M1 = Go +Toys 
8.44 = y3 +e3y3 m1 =a— by 
9.q2 = G1 +71Y4 


However, using the new theorems, we can justify the following, which is faster 
by one fma latency. 


lL. yo = ¢(1 +e) [frepal 
2.d=1— byo go = ayo 

3. dy = dd dj =dd+d 
4.d; =dod2+d yi = yo+ yods 
5.y2=Yyotyids ro =a— bq 
6. e = 1 — bye 1 = Go +ToYy2 
7.y3 =yotey2 r=a—bn 
8.g=q trys 


5 Conclusions and Related Work 


We have outlined an approach to the formal verification of classes of division 
algorithms which is a formalization and improvement of standard theoretical 
approaches [11,2]. The approach has been successfully applied to a large num- 
ber of division algorithms that Intel is distributing to help [A-64 programmers, 
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helping to give greater confidence in the correctness of these subtle algorithms. 
Moreover, the verification effort has led to some stronger theorems on which to 
base algorithms of this type, and so directly to some efficiency improvements. 

The verification is conducted on a detailed abstract model of the application 
programmer’s view of the IA-64 ISA, and naturally relies on the I[A-64 processor 
on which the code is run accurately implementing the ISA. Moreover, formal 
verification cannot completely guard against simple transcription errors in uti- 
lized versions of the code, a danger particularly significant since they may be 
inlined by various compilers and software development tools. For the purpose 
of isolating such errors as well as providing additional levels of assurance, In- 
tel has also developed extensive validation suites. Formal verification can never 
completely eliminate the need for such precautions, but it can allow us to focus 
testing on more productive areas. (Indeed, a particularly attractive feature of 
the ‘exclusion zone’ approach [2] is that the difficult cases are not only used in 
a formal proof but are also good test cases to exercise the algorithm and its 
practical realization.) 

As well as the floating point division work reported here, we have verified 
various analogous square root algorithms using a formalization of the refined 
exclusion zone approach (2]. In addition, we have formally verified several integer 
divide algorithms, which use a specialized floating-point division algorithm as 
a core. For an overview of the implementation of integer division on IA-64 and 
proofs of correctness, see [1]. Much more detail about the IA-64 implementation 
of division, square root and other mathematical functions are given in [12]. 

The closest related work to that described here is the formal verification of 
division algorithms reported in [13] and [15]. Although these are respectively for 
microcode and hardware RTL, and the present work is for software, this diffe- 
rence is not as significant as it may seem, since all these implementations seem 
to be modeled at a similar level. The major difference is that our work covers 
algorithms written using the standard resources available to the application pro- 
grammer, based on a high-level specification that the underlying operations are 
IEEE-correct. Other work on formal verification of division hardware using a 
combined theorem prover and model checker [14] is also closely related, but in 
this work the verification is taken down to a lower level (the implementation in 
terms of logic gates), and closely integrated with the overall design flow, helping 
to reduce the chance of transcription errors. 
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Abstract. Theorem provers for higher-order logics often use tactics to 
implement automated proof search. Tactics use a general-purpose meta- 
language to implement both general-purpose reasoning and computatio- 
nally intensive domain-specific proof procedures. The generality of tactic 
provers has a performance penalty; the speed of proof search lags far 
behind special-purpose provers. We present a new modular proving ar- 
chitecture that significantly increases the speed of the core logic engine. 
Our speedup is due to efficient data structures and modularity, which 
allows parts of the prover to be customized on a domain-specific basis. 
Our architecture is used in the MetaPRL logical framework, with spee- 
dups of more than two orders of magnitude over traditional tactic-based 
proof search. 


1 Introduction 


Several provers [8,9,3,11,12,15,18] use higher-order logics for reasoning because 
the expressivity of the logics permits concise problem descriptions, and because 
meta-principles that characterize entire classes of problems can be proved and re- 
used on multiple problem instances. In these provers, proof automation is coded 
in a meta-language (often a variant of ML) as tactics. Automation speed has a 
direct impact on the level of reasoning. If proof search is slow, more interactive 
guidance is needed to prune the search space, leading to excessive detail in the 
tactic proofs. 

We present a proving architecture that addresses the problem of speed and cu- 
stomization in tactic provers. We have implemented this architecture in the Me- 
taPRL logical framework, achieving more than two orders of magnitude speed-up 
over the existing NuPRL-4 implementation. We obtain the speedup in two parts: 
our architecture is modular, allowing components to be replaced with domain- 
specific implementations, and we use efficient data structures to implement the 
proving modules. 


* Support for this research was provided by DARPA grants F30602-95-1-0047 and 
F30602-98-2-0198 
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In this paper, we explain this modular architecture. We show that the logic 
engine can be broken into three modules: a term module that implements the 
logical language, a term rewriter that applies primitive inferences, and a proof 
module that manages proofs and defines tactics. The computational behavior of 
proof search is dominated by term rewriting and operations on terms, and we 
present implementations of the modules for domains with frequent applications 
of substitution (like type theory), and for domains with frequent applications of 
unification (like first-order logic). 

MetaPRL, our testbed, is implemented in Objective Caml {19]. It includes lo- 
gics like first-order logic, the NuPRL type theory, and Aczel’s CZF set theory [1]. 
We include performance measurements that compare MetaPRL’s performance 
with NuPRL-4 on the NuPRL type theory. In our measurements, we also show 
how particular module implementations change the performance in the different 
domains. 

One might think that the comparison between MetaPRL and NuPRL-4 is 
not very fair since NuPRL-4 uses interpreted ML and MetaPRL is implemented 
in OCaml. But in fact only very high-level code uses interpreted ML in NuPRL- 
4 while most of the time is spent performing low-level operations such as term 
operations and primitive rule applications. And in NuPRL-4 all the low-level 
operations are implemented in Lisp and are compiled by a modern Lisp compiler. 
This should make the comparisons relatively fair, especially when we are talking 
about two orders of magnitude speed difference. 

It should also be noted that MetaPRL is a distributed prover [14], leading to 
additional speedups if multiple processors are used. Distribution is implemented 
by inserting a scheduling and communication layer between the refiner and the 
tactic interface. For this paper, we describe operation and performance without 
this additional scheduling layer. 

The organization of the paper is a follows. In Section 2, we give an overview of 
tactic proving, and present the high-level architecture. In Sections 3, 4, and 5, we 
explore the proving modules in more detail, and develop their implementations. 
In Section 6, we compare the performance of the different implementations, and 
in Section 7 we summarize our results, and present the remaining issues. This 
work builds on the efforts of many systems over the last decade, and in Section 8 
we present related work. 


2 Architectural Overview 


We consider a general architecture of a tactic prover consisting of three parts, 
as shown in Figure 1. A logic contains the following kinds of objects: 


1. Syntax definitions define the language of a logic, 
2. Inference rules define the primitive inferences of a logic. For instance, the 
first-order logic contains rules like MODUS_PONENS in a sequent calculus. 
TrFA=sB Ita 


TAFA AXIOM TEB MODUS_PONENS 
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Logic-definition 


Syntax-definitions 
Rewrite-definitions 
Inference-rules 


Theorems 


Tactic interface 


Refiner 


Fig. 1. General tactic prover architecture 


3. Rewrites define computational equivalences. For example, the type theory 
defines functions and application, with the equivalence (Az.b) a <— bla/z]. 
4, Theorems provide proofs for derived inference rules and axioms. 


The refiner [5] performs two basic operations. First, it builds automation 
procedures from the parts of a logic. 


1. Syntax definitions are compiled to functions for constructing logical formulas. 
2. Rewrite primitives (and derived rewrite theorems) are compiled to conver- 
sions that allow computational reductions to be applied during a proof. 

3. Inference rules and theorems are compiled to primitive tactics for applying 

the rule, or instantiating the theorem. 


The second refiner operation is the application of conversions and tactics, pro- 
ducing justifications from the proofs. The major parts of the refiner interface 
are shown below.! It defines abstract types for data structures that implement 
terms, tactic and rewrite definitions, proofs, and logics. Proof search is perfor- 
med in a backward-chaining goal-directed style. The refine function takes a 
logic and a tactic search procedure, and applies it to a goal term to pro- 
duce a partial proof. The goal and the resulting subgoals can be recovered 
with the sub/goal_of_proof projection functions. Proofs can be composed with 
the compose proof subproofs function, which requires that the goals of the 
subproofs correspond to the subgoals of the proof, and that both derivations 
occurred in the same logic. If an error occurs in any of the refiner functions, the 
RefineError exception is raised. The tactic_of_conv function creates a tactic 
from a rewrite definition. The final two functions, called tacticals, are the pri- 
mitives for implementing proof search. Operationally, the andthen tac1 tac2 


! Throughout this paper we will use a simplified OCaml syntax to give the component 
descriptions. 
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tactic applies taci to a goal and immediately applies tac2 to all the subgoals, 
composing the result. The orelse taci tac2 is equal to tac1 on goals where 
taci does not produce an error, otherwise it is equivalent to tac2. 


module type RefinerSig = sig 
type term, tactic, conv, proof, logic 
exception RefineError 
val refine : logic — tactic — term — proof 
val goal_of_proof : proof — term 
val subgoals_of_proof : proof — term list 
val compose : proof — proof list — proof 
val tactic_of_conv : conv ~— tactic 
val andthen : tactic — tactic — tactic 
val orelse : tactic — tactic — tactic 
end 


The logic data type is the concrete representation of a logic. The MetaPRL 
logical framework defines multiple logics in an inheritance hierarchy (partial 
order) where if Lenya: logic inherits from Lyarent: logic, all the theorems of 
Lparent are valid (and provable) in Lng. In contrast, the NuPRL-4 prover has 
a single global logic containing the syntax and rules of the NuPRL type theory. 


In a prover like NuPRL-4, the refiner can Refiner Types 
be characterized as monolithic. There is no . 
well-defined separation of the refiner into com- - proof, logic 


ponents, and there is no well-defined interface , 

like the Ref inerSig we defined above—there 
is one built-in refiner. This has made it dif- 
ficult to customize and maintain NuPRL-4, “7- term 
and our choice in MetaPRL has been to par- 
tition the refiner into several small well-defined parts. 

This modular structure has an additional benefit: if we partition the refiner 
into abstract parts, we can create domain-specific implementations of its parts. 
While the whole refiner is a part of a trusted code base, we do not need to worry 
about introducing bugs while doing domain-specific optimization. When we need 
to be extra sure, that everything is correct, we can do proof development using 
the domain-specific code and later double-check the proof using the reference 
implementation. And for some parts of the system we even have a debugging 
mode that runs two implementations side-by-side and notifies the user if they 
behave differently. This not only protects us from bugs introduced by the domain- 
specific code, but also helps us to debug the reference implementation as well. 

The choice of partitioning we use is guided by the type definitions, producing 
the layered architecture shown at the right. The lowest layer, the term module, 
defines the logical language; the rewriter module implements applications of pri- 
mitive tactics and conversions using term rewriting; and the proof module defines 
the logic and proof data types. We present specifications and implementations 
of these modules in the following section. 


=#- tactic,conv 
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3 The Term Module 


All logical terms, including goals and subgoals, are expressed in the language of 
terms, implemented by the term module. The general syntax of all terms has 
three parts. Each term has 1) an operator-name (like “sum”), which is a unique 
name indicating the logic and component of a term; 2) a list of parameters 
representing constant values; and 3) a set of subterms with possible variable 
bindings. We use the following syntax to describe terms, based on the NuPRL 
definition [2]: 


opname  [pi;+++5 Pn] {V1-t15 +++; Um-tm} 
SS 
operator name parameters subterms 


A few examples are shown at the 


Displayed form 
1 


right. Variables are terms with a string number [1] {} 
parameter for their name; numbers have Ax.b lambda[]{x. b} 
an integer parameter. The lambda term f(a) apply[]{f; a} 
contains a binding occurrence: the va- 7 variable["v"] {} 


riable x is bound in the subterm b. crt+y sum[]{x; y} 

The term module implements several basic term operations: substitution 
(bla/x]) of a term (a) for a variable (x) in a term (b), free-variable calculati- 
ons, a-equivalence, etc. When a logic defines a rule, the refiner compiles the 
rule pattern into a sequence of term operations. The term interface is shown 
below. The abstract types opname, param, term, and bound_term represent ope- 
rator names, constant parameters, terms, and bound terms (the subterms of a 
term). The major operations include destructors to decompose terms and bound- 
terms, as well as a substitution function subst, free variable calculations, and 
term equivalence. 


module type TermSig = sig 
(* Types and constructors: *) 
type opname, param, term, bound_term 
val mk_opname : string list — opname 
val mk_int_param : int — param 
val mk_string_param : string — param 
val mk_term : opname — param list — bound_term list — term 
val mk_bterm : string list — term — bound_term 


(* Destructors and other operations: *) 
val dest_term : term — opname * param list * bound_term list 
val dest_bterm : bound_term —> string list * term 
val subst : (string * term) list — term — term 
val free_vars : term — string list 
val alpha_equal : term — term — bool 
end 
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3.1 Naive Term Implementation (Term_std) 


The most immediate implementation of terms is the naive “standard” imple- 
mentation, which builds the term with tupling. 
type opname = string list 
and param = Int of int | String of string 
and term = opname * param list * bound_term list 
and bound_term = string list * term 
While this structure is easy to implement, it suffers from poor substitution per- 
formance. The following pseudo-code gives an outline of the substitution algo- 
rithm. 
let rec subst sub t = 
if t is a variable then 
if (t, t’) € sub then t’ else t 
else let (opname, params, bterms) = t in 
(opname, params, List.map (subst_bterm sub) bterms) 
and subst_bterm sub (vars, t) = 
let sub’ = remove (v, t’) from sub ifv € vars in @ 
let vars’, sub’? = rename binding variables to avoid capture in (2) 
(vars’, subst sub’’ t) 
The sub argument is a list of string/term pairs that are to be simultaneously 
substituted into the term in the second argument. The main part of the substi- 
tution algorithm is in the part for substituting into bound terms. In step @, the 
substitution is modified by removing any string/term pairs that are freshly bo- 
und by the binding list vars, and in step @), the binding variables are renamed 
if they intersect with any of the free variables in the terms being substituted. 
Roughly analyzed, this algorithm takes time at least linear in the size of the 
term on which the substitution is performed. Furthermore, each substitution 
performs a full copying of the term. Substitution is a very common operation in 
MetaPRL — each application of an inference rule involves at least one substi- 
tution.? The next implementation performs lazy substitution, useful in domains 
like type theory. 


3.2 Delayed Substitution (Term_ds) 


If substitution is frequent, it is often more efficient to save computations for use 
in multiple substitution operations. We use three main optimizations: we save 
free-variable calculations, we perform lazy substitution, and we provide special 
representations for commonly occurring terms. 

When a substitution is performed on a term for the first time, we compute 
the set of free variables of that term, and save them for later use. When a 


? Testing for a-equivalence also takes linear time. One way to decrease the cost would 
be to use a normalized representation (such as a DeBruijn representation). However, 
term destruction on the normalized representation can be expensive because of the 
need to rename variables that become free (what are the subterms of Axv.Ay.ry and 
Az.Ay.yz?). These renamings can be delayed, as the next Section shows, but the cost 
of equivalence testing will increase. 
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substitution is applied, the free-variables set is used to discard the parts of the 
substitution for variables that do not occur free in the term. This saves time, 
and it also saves space by reusing subterms where the substitution has no effect 
instead of unnecessarily copying them. Memory savings, in turn, further improve 
performance by improving the CPU cache efficiency and reducing the GC time. 

During proof search, most tactic applications fail, and only a part of the 
substitution result is usually examined in the proof search. In this common case, 
it is more efficient to delay the application of a substitution until the substitution 
results are actually requested by the dest_term function. 

We also optimize two commonly-occurring terms: variables and sequents. 
Rather than using the term encoding of variables, we provide a custom repre- 
sentation using a string. The sequent optimization uses a custom data structure 
to give constant-time access to the hypotheses, instead of the usual linear-time 
encoding. These “custom” terms are abstract optimizations—they do not change 
the Term interface definition. For each custom term, we add special-case handlers 
to each of the generic term functions. 

The following definition of terms uses all of these optimizations (the definiti- 
ons for the bound_term, opname and param types are unchanged). The definition 
of sequents, which we omit, uses arrays to represent the hypotheses and conclu- 
sions of the sequent. 


type term = { free_vars : VarsDelayed 
| Vars of string set; 
core : Term of (opname * param list * bound_term list) 


| Subst of (subst * term) 
| Var of string 
| Sequent of sequent } 
and subst = (string * term) list 
and sequent = --- 

The free_vars field caches the free variables of the term, using VarsDelayed 
as a placeholder until the variable set is computed. The core field stores the term 
value, using the Term variant to represent values where a substitution has been 
expanded, the Subst variant to represent delayed substitutions, and the Var 
and Sequent variants for custom terms. We maintain the following invariants on 
Subst: substitution lists are never empty, and the domain of the substitution is 
included in the free-variables of the term. 

The free-variables computation is one of the more complex operations on this 
data structure. When the free variables are computed for a term, there are three 
main cases: if the free variables have already been computed, they are returned; 
if the core is a Term, the free variables are computed from the subterms; and 
if the core is a delayed substitution, the substitution is used to modify the free 
variables of the inner term. 

let rec free_vars = function 
{ free_vars = Vars fv } — fv 
| { core = core } ast > 
let fv = match core with 
Term (_, _, bterms) — Set.map_list free_vars_bterm bterms 
| Subst (sub, t) — free_vars_subst sub (free_vars t) 
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| Var v + Set.singleton v 
| Sequent seq — free_vars_sequent seq 
in (t.free.vars <- Vars fv); fv and free_vars_bterm (bvars, t) = 
Set.subtract_list (free_vars t) bvars 
and free_vars_subst sub fv = 
Set.union 
(Set.subtract_list fv (List.map fst sub)) 
(Set.map-list free_vars (List.map snd sub)) 
If the free variables haven’t already been computed, the free_vars function 
computes them, and assigns the value to the free_vars field of the term. In the 
Term case, the free variables are the union of the free variables of the subterms, 
where any new binding occurrences have been removed. In the Subst (sub, t) 
case, the free variables are computed for the inner term t, then the variables 
being replaced are removed from the resulting set, and then the free variables of 
the substituted terms are added. 

The subst function has a simple implementation: eliminate parts of the sub- 
stitution that have no effect (in order to maintain the invariant), and save the 
result in a Subst pair if the resulting substitution is not empty. 

let subst sub t = 
let fv = free_vars t in 
match remove (v, t’) from sub ifv ¢g fv with @ 
[] -+ t (* substitution has no effect *) 

| sub’ > { free_vars = VarsDelayed; core = Subst (sub’, t) } 
The set implementation determines the complexity of substitution. If the set 
lookup takes O(1), then pruning @ takes time linear in the number of variables 
in sub. 

The effect of the substitution is delayed until the term is destructed. The 
dest_term function is required to expand the substitution by one step. We use 
the get_core function, shown below, to expand the toplevel substitutions in the 
term. If the substitution was applied to a Term, get_core will push it down to 
the immediate subterms. After the substitution is expanded, get_core will store 
the result in the core field to save time on the next get_core invocation. As 
usual, we omit the code for sequents. 


let rec get_core = function 
{ core = Subst (sub, t’) } as t > 
let core’ = match get_core t’ with 
Var v > get_core (List.assoc v sub) (* always succeeds *) 
| Term (opname, params, bterms) —> 
Term (opname, params, List.map (do_bterm_subst sub) bterms) 
| Sequent seq — Sequent (sequent_subst sub seq) 
in (t.core ¢ core’); core’ 
| { core = simple_core } — simple-core 
and do_bterm_subst sub (vars, t) = 
let sub’ = remove (v, t) from sub ifv € vars in 
let vars’, sub’’ = rename binding variables to avoid capture in 
(vars’?, subst sub’’ t) 


Note that the List .assoc in the Var case will never fail, due to our invariants. 
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The dest_term function first uses get_core to expand the top-level substitu- 
tion (if any), and then it returns the parts of the term. To preserve the external 
interface of the term module, it is also required to convert the custom terms 
back to their original form. 


let rec dest_term t = match get_core t with 
Term (opname, params, bterms) -> (opname, params, bterms) 
Var v — (mk_opname ["variable"], [String v], []) 
Sequent s — dest_sequent s 


4 The Rewriter Module 


The rewriter performs term manipulations for rule applications. Inference rules 
and computational rewrites are both expressed using second-order patterns. For 
example, the rewrite for beta-reduction is expressed with the following pattern: 


(Az.bz) a -+ ba 


In this rewrite, the variable a is a pattern variable, representing the “argument” 
term. The variable b, is a second-order pattern variable, representing a term with 
a free variable x. The pattern 6, represents a substitution, with a substituted 
for x in b. The (Az.b,) @ is called the redex, and the substitution b, is called the 
contractum. 


module Rewrite (Term : TermSig) : sig 
type redex_prog, con_prog, state 
exception RewriteError 
val compile_redex : term — redex_prog 
val apply_redex : redex_prog —+ term — state 
val compile_contractum : redex_prog — term — con_prog 
val build_contractum : con_prog —> state — term 
end 


In NuPRL-4 the computation and inference engines are implemented as se- 
parate interpreters that are parameterized by the rewriting patterns. In Meta- 
PRL we combine these functions and improve performance by compiling to a 
rewriting virtual machine. The MetaPRL rewriter module provides four major 
functions. The compile_redex function takes a redex pattern, expressed as a 
term, and it compiles it to a redex program. The apply_redex function applies 
a pre-compiled program to a specific term, raising the RewriteError exception 
if the pattern match fails, or returning a state that summarizes the result. The 
compile_contractum compiles a contractum pattern against a particular redex 
program, and the build_contractum function takes the contractum program 
and the result of a redex application, and produces the final contractum term. 

Currently, the rewrite module compiles redices to bytecode programs that 
perform pattern matching, storing the parts of the term being matched in several 
register files. Contracta are also compiled to bytecode programs that construct 
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the contractum term using the contents of the register file. The virtual machine 
has the four parts shown in Figure 2: 


1. a program store and program counter for the rewrite program, 

2. a term/bterm stack with a stack pointer to manage the current term being 
rewritten, 

3. a term/bterm register file, 

4. a parameter register file for each type of parameter. 


The instructions for the machine are shown in Figure 3. The matching in- 
struction dest_term checks if the term at the top of the term stack has the 
operator name opname, and if it has the right number of bound terms and para- 
meters of the given types. If it succeeds, the parameters are saved in the parame- 
ter registers, the term is popped from the term stack, and the bound terms are 
pushed onto the stack. The mk_term instruction does the opposite: it retrieves 
the parameter values from the register file, pops bc bound terms from the stack, 
adds the opname and pushes the resulting term onto the stack. The dest_bterm 
and mk_bterm functions are used to save and restore binding variables for the 
term at the top of the stack. 

The so_var instruction pops a term from the term stack and saves it in 
term register r, along with the free variables in v1,...,vU,. The corresponding 
constructor so_subst pops bc terms from the stack, substitutes them for the va- 
riables v;,..., VU, in term r, and pushes the result onto the stack. The match_term 
instruction is used during matching for redices like (x + x) + 2a that contain 
common subterms. 

The example in the Figure gives the code for a beta-reduction. The first 
dest_term instruction matches the outermost apply term and pushes the func- 
tion and argument onto the stack. The dest_bterm operations remove the bin- 
ding variables of the subterms, and the so_var instructions stores the results to 
the register file. At the end of a match against the term (Az.b) a, register rj 


Rewriting virtual machine 


program term/bterm term/bterm number 
registers registers 


string 
registers 


Fig. 2. Rewrite virtual machine 


262 J. Hickey and A. Nogin 


Instructions Example: (Az.bz) @ —> ba 


Matching: Redex: 

dest_term opname[p1;---;pn].bc]dest_term apply[].2 
dest_bterm U1,...,Un dest_bterm 

match_term r[ti;---;tn] dest_term  lambdal].1 
so_var riers: + +5 en] dest_bterm vj 
Constructors: so_var ri(vi] 
mk_term opname|po; - - - ; Pn].bc | dest_bterm 

mk_bterm V1,.++5Un so_var T2 l} 
so_subst r.be 


Contractum: 
so_subst r2.0 
so_subst ry.1 


pi: parameter register 
vi: string register 

r: term register 

t;: literal term 

be: arity of bterm 


Fig. 3. Virtual machine instructions 


contains 6, register rg contains a, and register v; contains x. When the contrac- 
tum is constructed, the first instruction pushes a onto the stack; and the second 
instruction pops a, substitutes it for x in b, and pushes the result. 


5 The Proof Module 


The third part of the refiner manages validity in logics as well as maintaining 
proof trees for theorems. The proof module exports the interface shown below. 
The empty_logic is the logic without any rules/rewrites. The join_logics fun- 
ction builds the union of two logics, and the add_rule and add_rewrite function 
add rules/rewrites from their syntactical description as terms. The proof type 
represents a partial proof tree [7], which may be modified by applying a tactic 
to the proof goal with the refine function. The compose function is used to 
stitch together partial proofs into larger proofs. Bookkeeping must be performed 
here—the proofs being joined must belong to the same logic. If an error occurs 
in any of the functions, the Ref ineError exception is raised. These functions are 
not difficult to implement, and we skip the description of their implementations. 


module Proof (Term : TermSig) (Rewrite : RewriteSig) : sig 
type logic, tactic, rewrite, proof exception RefineError 
val empty_logic : logic 
val join_logics : logic — logic — logic 
val add_rule : logic + term — logic * tactic 
val add_rewrite : logic — term — logic * rewrite 
val new_proof : term — proof 
val refine : proof — tactic — proof 
val compose : proof — proof list —+ proof 
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val proof_goal : proof — term 
val proof_subgoals : proof — term list 
end 


6 Performance 


We group the performance measurements into two parts. All measurements were 
done on a Linux 400MHz Pentium machine, with 512MB of main memory, and 
all times are in seconds. For the first part, we compare the speed of the Meta- 
PRL prover (using the modular refiner) with the NuPRL-4 prover. For the first 
example, we perform pure evaluation based on the following definition of the 
factorial function: 


rewrite fact{i} + if i =0 then 1 else i * fact {i — 1} 


We used the following evaluation algorithm: recursively traverse the term top- 
down, performing beta-reduction, unfolding the fact definition (taking care to 
evaluate the argument first), etc. This algorithm stresses search during rewriting. 
Roughly speaking, evaluation should be quadratic in the factorial argument: each 
term traversal is linear in the size of the term, and the size of the term grows 
linearly with each traversal (rewriting does not use tail-recursion), until the 
final base case is reached and the value is computed. The following table lists 
the performance numbers. 


Argument value 
Configuration | 100 250 400 650 


Termstd |0.352.05 5.42 16.0 
Term_ds 0.42 2.41 6.32 18.4 
NuPRL-4 55 330 >1800 >1800 


On this example, the NuPRL-4 took between 125 and 160 times longer on the 
problems where it finished within 30 minutes. On the two larger problems, we 
terminated the computation after 30 minutes.? In MetaPRL, the largest problem 
performs about 14 million attempted rewrites. 

This table also shows a difference between the term module implementations. 
The “naive” term module performs better on this example because the recursive 
traversals of the term expand most of the delayed substitutions. 

The next example also compares MetaPRL with NuPRL-4, on the pigeonhole 
problem stated in propositional logict. The pigeonhole problem of size i proves 
that i+ 1 “pigeons” do not fit into 7 “holes.” The pigeonT tactic performs 


3 NuPRL-4 can evaluate these terms. The built-in term evaluator, which bypasses the 
refiner, evaluates the largest example in about 22 seconds. 

4 This formalization of pigeon-hole principle and methods we are using to prove it 
are obviously highly inefficient. However this formalization provided us with a nice 
way of comparing the performance of simple propositional proof search in the two 
systems. 
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a search customized to this domain, and the propDecideT tactic is a generic 
decision procedure for intuitionistic propositional logic (based on Dyckoff’s al- 
gorithm [10]). Both search algorithms use only propositional reasoning and both 
explore an exponential number of cases in 7. 


Problem size 
2 3 4 
<0.1 2.53 94.0 


Configuration Tactic 
Term_std pigeonT 
Term_ds pigeonT <0.1 0.71 17.0 

NuPRL-4 pigeonT 0.5 89 >1800 
Term_std propDecideT] 0.3 238 >1800 
Term_ds propDecideT} 0.13 55.0 >1800 
NuPRL-4 propDecideT | 21.9 >1800 >1800 


In this example, NuPRL-4 works between 125 and 170 times slower than 
Term_ds. And the delayed-substitution implementation of terms performs sig- 
nificantly better than the naive implementation, partly because of the efficient 
substitution in the application of the rules for propositional logic, and also be- 
cause the Term_ds module preserves a great deal of sharing of common subterms. 
On the largest problem the pigeonT tactic performs about 1.57 million primitive 
inference steps. 

For the last examples, we give a few comparisons between the MetaPRL 
modules in two additional domains. The GEN problems is a heredity problem 
in a large first-order database. The NUPRL problem is an automated rerun of 
all proof transcripts in the NuPRL type theory. The transcripts contain a mix 
of low-level proof steps, such as lemma application and application of inductive 
reasoning, to higher-level steps that include verification-condition automation 
and proof search. The transcripts contain about 2,500 interactive proof steps. 


We don’t include performance measurements 
for NuPRL-4 on these examples, because the sy- Configuration | GEN NUPRL 
stem differences require a porting effort (for in- Term_std 
stance, NuPRL-4 does not currently implement Term_ds 14.4 36.6 
a generic first-order proof search procedure). In 
our experience with NuPRL-4, proofs with several hundred steps tend to take 
several minutes to replay. 

Once again, the Term_.ds module performs better than the naive terms, due to 
the frequent use of substitution in applications of the rules of these theories. The 
times for proof replay include the time spent loading the proof transcripts and 
building the tactic trees. This cost is similar for both term implementations, 
and the performance numbers are comparable. The first-order problem, GEN, 


performs proof search by resolution, using the refiner to construct a primitive 
proof tree only when a proof is found. This final step is expensive, because each 


5 This does not include the space that the system occupied after the initial loading — 
19MB with term_std and 20.5 MB with term_ds 
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resolution step has to be justified by the refiner. The final successful proof in 
this problem performs about 41 thousand primitive inference steps. 


7 Summary 


We have achieved significant speedups for tactic proving. Our new prover de- 
sign shows consistent speedups of more than two orders of magnitude over the 
NuPRL-4 system. Most of this speedup is due to efficient implementations of the 
prover components, but an additional part is due to the modular design, which 
allows the prover to be customized with domain-specific implementations. In 
addition, the MetaPRL system is programmed in OCaml, an efficient modular 
language. In contrast, NuPRL-4 tactics are programmed in classic ML, which is 
compiled to Common Lisp, and the NuPRL-4 refiner is implemented in Common 
Lisp. 

In first-order logics, we estimate that an order of magnitude speed factor re- 
mains between MetaPRL and provers like ACL2 [17]. Some of this difference can 
be addressed with a specific refiner modules: a first-order term module would 
contain custom representations for terms in disjunctive normal form and se- 
quents (sequents provide particularly poor representations for large first-order 
problems), and the rewrite module would optimize inference by resolution. Ho- 
wever, a better solution would be to integrate first-order provers into the logical 
framework using translation modules that provide a tactic interface through 
encapsulation of the external functions. 

There are a few avenues left to explore. Since we compile rewrites to bytecode, 
it is natural to wonder what the effect of compiling to native code would be. Also, 
while we currently do not optimize the proof module, there is significant overhead 
in composing and saving the primitive proof trees. In some domains, we may be 
able to perform proof compression, or delay the composition of proofs. 


8 Related Work 


Harrison’s HOL-Light [13] shares some common features with the MetaPRL 
implementation. Harrison’s system is implemented in Caml-Light, and both sy- 
stems require fewer computational resources than their predecessors. Howe [16] 
has taken another approach to enhancing speed in NuPRL-4. The programming 
language defined by the NuPRL type theory is untyped, leading to frequent 
production of well-formedness (verification) conditions. Using type annotations, 
Howe was able to speed up rewriting in NuPRL-4 by a factor of 10. We haven’t 
attempted to apply Howe’s ideas to MetaPRL implementation of NuPRL type 
theory, but we believe that MetaPRL performance can be further improved using 
these ideas. 

Basin and Kaufmann [4] give a comparison between the NuPRL-3 system and 
NQTHM [6] (the predecessor of the ACL2 [17] system). The NQTHM prover uses a 
quantifier-free variant of Peano arithmetic. Basin and Kaufmann’s measurements 
showed that NQTHM was roughly 15 times faster than NuPRL-3 for different 
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formalizations of Ramsey’s theorem. It is likely that ACL2 and NuPRL-4 have 
a larger gap in speed. 
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Implementing a Program Logic of Objects in a 
Higher-Order Logic Theorem Prover 
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Abstract. We present an implementation of a program logic of objects, 
extending that (AL) of Abadi and Leino. In particular, the implemen- 
tation uses higher-order abstract syntax (HOAS) and—unlike previous 
approaches using HOAS—at the same time uses the built-in higher-order 
logic of the theorem prover to formulate specifications. We give examples 
of verifications, extending those given in [1], that have been attempted 
with the implementation. Due to the mixing of HOAS and built-in lo- 
gic the soundness of the encoding is nontrivial. In particular, unlike in 
other HOAS encodings of program logies, it is not possible to directly re- 
duce normal proofs in the higher-order system to proofs in the first-order 
object logic. 


1 Introduction 


The object-oriented (henceforth abbreviated as “OO”) style of programming has 
shown to be exceptionally popular for developing large systems in a modular 
fashion. Despite its popularity, it is still lacking with regards to formal methods 
for verification. 

This article is a foundational contribution towards the development of formal 
tools verification for OO languages. We have implemented a program logic for an 
object. calculus, based on the logic from [1,2]. We have used the proof assistant 
LEGO([6] for historic reasons, though the techniques can be applied to other 
existing theorem provers, for example PVS and Isabelle/HOL. The encoding is 
notable for using: 


— HOAS for encoding program syntax; and 
— a direct embedding of the object logic into the metalogic. 


The use of HOAS simplifies the encoding since we inherit variable scoping rules 
and alpha-conversion from the metalogic. Using the metalogic itself allows us to 
use the built-in features of the theorem prover. As a consequence of these two 
implementation decisions, soundness is non-trivial. 
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We give examples that have been attempted with the implementation. We 
considered the examples from Abadi and Leino as presented in [2]. Furthermore 
we extend their work with a new example based on the dining philosophers 
scenario, 

Hereafter, the article is organised as follows. We first present a summary 
of the program logic from [2], giving the syntax of the object calculus and the 
verification axioms. We then present our implementation. Though the actual im- 
plementation is in LEGO, for expository purposes, we take as our metalanguage 
the fragment without universes, dependent and inductive types. We then give a 
selection of examples we have verified. A statement of the soundness property 
as well as the idea behind the proof are given in Section 5 but the details will be 
given elsewhere. Finally we conclude the article by giving some subjective views 
which have arisen from the work, and give a survey of related work. 


2 The Abadi-Leino Logic 


In [2], we are presented with an imperative, typed object language with subtyping 
but not recursive types. The language is given a syntax-directed operational 
semantics and also a Hoare-style verification logic for program correctness, which 
we will refer to simply as AL. The verification logic is proved to be sound but is 
also shown to be incomplete. 

Objects are records of fields and methods. The only other types apart from 
objects are booleans and natural numbers. Each field is of primitive type or 
object type. Each method has exactly one bound variable denoting “self”. The 
methods do not have any other formal parameters but arguments can be indi- 
rectly passed to them by first assigning to the fields in the object. There is no 
data abstraction nor an inheritance mechanism. 

One record is a subtype of another if the one contains all fields of the other 
and for each method m in the other, a method whose return type is a subtype of 
that of m. The subtyping relation is said to be covariant with respect to method 
return values and invariant with respect to fields. 

Method bodies are of the form ¢(s)b where 6 is a program typically with free 
occurrences of s. Programs are built-up using constants (booleans and natural 
numbers), variables, and constructors for objects, let statements, conditional 
statements, field lookup, method invocation and field update. For field names 


fo,-.- fx, variables xo,... , 2%, method names mo,... ,™;, and method bodies 
s(s)bo,... ,¢(s)b;, we have the program 
([fo=Xo,..- , fke=Zk,Mo=s(s)bo,... ,m=s(s)by] 


which evaluates to a reference to an object with fields f; and methods m;. For 
a variable x and programs a and b, where b typically contains free occurrences 
of x, we have the program 


letr=ainb. 
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The remaining constructions are standard and written if x then ag else a,, x.f, 
x.m and «.f:=y, where ao and a; are programs, x and y are variables, f is a 
field identifier and m is a method identifier. 

The program let z=a in b introduces the variable x. Execution of this program 
evaluates a, then b with occurences of x replaced with the previous result of 
evaluating a. Variables cannot be assigned to. However, a result can be an object 
name which does have state, and in this way mutable storage cells can be encoded 
as objects with one field. Since a is executed before b, we can define sequential 
composition a; in the usual way. 

For arbitrary programs a with object type and field f, it is, in general, not 
possible to write a.f with the intention to mean “evaluate a then look up f 
in the result.” Such a program has to be encoded via the let construction as 
let r=a in x.f. Similarly, it is in general not possible to write a.f:=b nor a.m. 
This restriction simplifies the rules since evaluation of a variable is not side- 
effecting. Evaluation of an arbitrary program a is possibly side-effecting. 

Since object terms evaluate to references, we also have the phenomenon of 
aliasing. For example, in the program 


let x=a in (let y=x in b) , 


the variables x and y both refer to the same object during the excution of b: 
changes to x through field updates and method invocations also change y. 

The semantics is given in terms of an abstract machine with a stack and a 
store. All terminating programs evaluate to a result in R which is the set of 
references (referred to as object names H) and constants. An evaluation relation 
is introduced, written 


S,ckarv,a' , 


meaning program a evaluated with stack S and initial store o terminates with 
result v and final store a’. We also have the set F of fieldnames and M of 
methodnames. A stack is a partial mapping from variables to results. A store is 
a mapping from object names to a pair of records: one of results indexed by field 
names, and one of method closures indexed by method names. 

The verification logic is used for deriving judgements of the form 


Pika: A:T 


where I is a context, a is a program, A a specification and T is a transition 
relation. A context is a sequence r9:Ao,... ,2%:Ax of variable/specification pairs. 
Transition relations describe the behaviour of executing a program. They play 
the réle of the assertions p and q in a Hoare triple {p}S{q} and use the pseudo- 
variables (6, alloc) and (¢, alloc) to refer, respectively, to the store before and 
after execution, and r to refer the result. As a pair, (¢, alloc) can be considered 
as a (curried) partial function of type H — (F UM) — R. Formally ¢ is a 
total function of type (H x (F UM)) - R and alloc is a predicate over H 
that defines the domain of the partial function. A specification describes what 


Implementing a Program Logic of Objects 271 


a result from executing a program can potentially do. It can be thought of as 
the interface of an object. Specifications are necessary because a result can be 
an object with methods which are in essence “thunked” functions in the sense of 
suspended computations. For compositionality of these verification judgements, 
it is important that we can deduce what potential behaviour an object has. 

We follow [2] and introduce the abbreviation Res for creating transition re- 
lations, defined by 

Res(e) @ r =e A (Wx, y).(6(x,y) = o(2, y) A (alloc(x) = alloc(x))) . 
The predicate Res(e) is used to describe an execution where the result of eva- 
luation is e and the store is not changed. For example, we have the constant 
rule 


Elto 
E lt false : Bool :: Res(false) 


(where the judgement £ |r o simply means that E is a welldefined environment.) 
This rule states that evaluating a constant does not change the store and the 
result of evaluation is the constant itself. Another easy rule is that for field 
lookup: 


Elta.f: As: Res(a(z,f)) ¢ 


Note that in the premise, we have the simple judgement E lk x : [f:A] :: Res(2). 
To apply this rule to variables that have more fields or even methods, we must 
apply the subsumption rule that is defined below. 

By allowing transition relations to be strengthened in a subspecification, 
the subtype relation is straight-forwardly extended to specifications to give a 
subspecification relation <:. The formal definition can be found in Sec. 3.3 of [2]. 

Using the subspecification relation, we have the important rule of subsump- 
tion for verification judgments: 


I A<:Al Ip: T>T’ Elka: A:T 
Elta: Al:T’ 


provided A’ and T” are wellformed. Here |+s,; ¢ denotes provability in first-order 
logic augmented with the standard axioms for equality. 
We also have the let rule. This rule is defined 


Elta:Az:T E,xv:Al-b: B::T’ 

Ett B E\+ T” is a transition relation 

ty T6/6, alloc/ altoc,x/r] A T'[e/6, alloc/ alloc => T" 
Eilt letr=a ind: B::T" 


where the judgement F |+ B means that B is a wellformed specification. Intui- 
tively, we have £ It let r=a in b : B :: T” provided T” is a consequence of T 
and T’. The substitutions [o/¢, alloc/alloc, x/r| and [@/6, alloc/alloc| handle the 
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intermediate state that exists after executing a but before executing } and also 
the assignment of the result of a to variable z. 

With the exception of the subsumption rule, the let rule is the only other 
rule that cannot be applied backwards. This is because the premise contains the 
following formula 


Ifo. T[a/6, alloc/ alloc, x/r] A T'[e/6, alloc/ alloc] > T” 


and the formulae on the left of the connective do not appear in the conclusion 
of the rule. This formula is necessary because 


T(a/6, alloc/ alioc, x/r] A T{6 /6, alloc/ altoc| 


is not a transition relation since it possibly has free occurences of 6, alloc) and 
x. In a higher-order setting, we can use existential quantification to bind such 
free variables. Note that this problem is not present in Hoare logic because the 
intermediate state is existentially quantified in the metalogic. 


3 Implementation 


We implement a logic that is based on that presented in [2]. In the implementa- 
tion we: encode the language syntax using HOAS; and use the meta-logic itself 
as the logic for writing transition relations. 

Since our meta-logic is higher-order, our logic differs from that of [2] in that 
transition relations are now higher-order formulae. We appropriately modify the 
premises in the subsumption and let rules to take into account this difference. 

A convenient consequence of our use of higher-order logic is that we can 
derive new rules that are often easier to use. 


3.1 Metalanguage 


We now introduce the metalanguage which is used to present the implementation 
of the program logic. The metalanguage is based on a higher-order simply-typed 
lambda calculus. 

For base types, we have the natural numbers nat, booleans bool, variables 
VV, fieldnames FN, methodnames MN, results EE, specifications SS and program 
terms PP. We have the usual type formers: propositions Prop, function space 
T, —> Tq, product space 7; X 72, 7,-indexed record Red?? with entries from 79, 
variables x, application e; €2 and abstraction Az7.e,;. The type of transition 
relations TR is defined to be EE + (EE + FN - EE)? -+ (EE > Prop)? — Prop, 
where we write --- —» A® -+--- as shorthand for --- > A+ A—>---. 

We define a higher-order classical logic using the constants V; : (tT -» Prop) > 
Prop and >: Prop + Prop > Prop for universal quantification and implication 
respectively. We take the standard classical higher-order logic encodings of con- 
junction (A), disjunction (V), negation (—), existential quantification (4) and 
Leibniz equality =": 7 — 7 — Prop. 
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For predicates P,Q : 7 — Prop, we write P C Q as shorthand for V-x.P(z) D 
Q(x). In particular, for sets A and B, the formula A C B simply means than A 
is a subset of B, as expected. We define composition of 7 : TR and U : EE > TR, 
written T;U by 


(T; U)(7, 6,6, alloc, alioc) = 3¥, 6, alloc. T(¥, 6,6, alloc, alloc) A 


U(¥,7,6,6, alloc, alloc) 


We have for constants: the normal constants for natural numbers (0, succ, 
+,...); booleans (ff, tt, and, ...); product types (7]0"? 15°, (—, —)+;,7)3 the 
record manipulation constants: 


lookup;,,7, : Red}? + 71-472 update;, ,, : Red7? > 7) > T2 + Red? 
. T2 H F T2 
empty;,,7, : Red? domain,, 7, : Red7? > 71 — Prop . 


We omit type annotations whenever possible, use parentheses with commas 
for (possibly) repeated application of terms, and infix notation where appro- 
priate. We use the more succinct notation [f;=a1,... , fr=a,z] for records, and 
for a of type Rcd”, we write a, for lookup(a,e). We write f € dom(r) for 
domain,,.,,(r, f). 

We have two types of judgements: typing judgements [’ + e: 7 and validity 
judgements I’ + ¢ provided [+ ¢: Prop. We take the standard typing rules 
of simply typed lambda calculus and the standard logical rules and axioms of 
classical higher-order logic with Leibniz equality. 

The intended interpretation of elements of record types are partial functions 
with finite domain. Thus equality over record types is the standard equality of 
partial functions, namely equality on their graphs. Thus we have a number of 
axioms on records that delineate this interpretation. 


3.2 Program Logic 


The remaining constants are those for the program logic: specification construc- 
tors nat : SS, bool : SS and obj : Redpe, > Redpin oF? TR — SS; specification 
subsumption <: : SS -» SS —» Prop; program term constructors false : PP, 
true : PP, let : PP + (VV > PP) > PP, obj : Redfy > Red¥M PP — PP, 
if : VV —- PP — PP - PP, var : VV — PP, fsel : VV — FN — PP, 
minv : VV — MN — PP and fupd : VV — FN — VV -> PP; the value con- 
tructors booleg : bool > EE and natee : nat — EE; and the formal symbol that 
represents the stack vargee : VV — EE. The constants boolgg, nateg, and varee can 
be seen as coercion functions and will therefore be omitted in this presentation. 

Constant let gives our encoding of syntax its higher-order nature. As an 
illustration: program 


let r=true in if x then false else true 
has encoding 


let(true, Ax. if(z, false, true)) . 
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We generalise the encoding procedure and write "a’ for the encoding of a. 
Most important of all, we have the constant [— : — :: -]: PP > SS > TR-> 
Prop and we encode the rules so that if 


v1:A1,...,¢ntAn ka: A:T 
then we have 
Vit ssurgleea Ay} oD fee ApS Pal Ate | 


where [— : —] is defined by [a : A] ef [var(x) : A :: Res(x)J. 

Hereafter, unless explicitly typed otherwise, (meta-)variables and decorated 
variants of: n have type nat, x and y have type VV; f have type FN; m have 
type MN; a have type PP; b have type VV — PP; A have type SS; B have type 


EE — SS; T have type TR; and U have type EE > TR. 


Subsumption axioms. The following axioms are for the subsumption relation. 


Vreass A, A’ Np 4 (E759) x (ce TR) B, B’. 
FN COMN 
(dom(A’) C dom(A)) D 
(Vf € dom(A’).As = Aj) D 
(dom(B’) C dom(B) D (ss_obj) 
(¥m € dom(B’).m(B,,) <: m1 (By,)) D 
(Vm € dom(B’).Vy.72(Bm)(y) © m2(Bin)(y)) 2 
obj(A, B) <: obj(A’, B’) 
bool <: bool (ss_bool) 
nat <: nat (ss_nat) 


Program axioms. The remaining axioms are those of the program logic. They 
are encoded as follows. 


VaVA NT’ VANT. 
- ae (ws_subs) 
fa: A:T] Dla: A:T’ 
[false : bool :: Res(ff)] (ws_constf) 
[true : bool :: Res(tt)] (ws_constt) 
Vn.{nat(n) : nat :: Res(7)] (ws_nat) 


As an example of functions over constants, for any binary natural number ope- 
ration op, we have 


Vr0,21- 
[Zo : nat| D [z, : nat] D (ws_natop) 
fop(xo, 21) : nat :: Res(op(xo, 71))] 
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Ya.Vao, a, VBVUNBo, B, NU, U,. 


> 
Uo(tt) = U(tt)) dD (ws_cond) 
fay: By(x) :: Uy 
(By (ff) = B(ff) A U\ (££) = U(£f)) D 
lif(z,ao,a1): B(x) :: U(x)] 
VaNbVAWI" VANTNU. 
fa: A: T]D 
(Va.[a: A] D [b(x) : A’ :: U(z)]) D (ws_let) 
(T;UCT"\D 
let(a,b): Bs: T”] 
Va. Vm. VBNU, 
{x : obj({], (m=(B,U)})] dD (ws_minv) 
[minv(z,m) : B(x) :: U(a2)] 
Va.VfVANT. 
fx: obj((f=A], §)] > 
(Vr.¥a,é.Valloc, alloc. (ws_tisel) 
T(r, 6,6, alloc, alloc) = Res(é(zx, f),r,&,6, alloc, alloc)) > 
[fsel(z, f): As: T] 
VRedvy ® Reavy PP BV Rass AN p g(FE>S8) x (EE>TR) BNT. 
(dom(az) = dom(A) A dom(b) = dom(B)) D 
(Vf €dom(a).[var(x¢) : Az]) D 
(¥m € dom(b).¥y-[y : obj(A, B)] > [bm(y) : (Br)(y) #2 72(Bm)(y)}) > 
Vr.va,6NValloc, alloc. 
T(r, 6,6, alloc, alloc) = (Vz.z24¢rd alloc(z) = alloc(z)) A 
(Vf € dom(a).6(r, f) =a) A 
(Vz.Vw.z #r D G(z,w) = G(z,w)) 


( 
(Bo(tt) = B(tt) 
( 
) 


2 


[obj(a, b) : obj( A, B) :: T] 
(ws_obj) 
Va, VE V Reds AV pcgife 55) x(eestey BVT. 
[z : obj(A, B)] D 
[y:As)> 
Yr.Vo,oValloc, alloc. 
T(r,o,6, alloc, alloc) =r=2rAd(zrz,f)=yA 
(VzeNw.n(z=rAw=f)d |] > 
O(z,w) =6(z,w)) A 
alloc = alloc 
[fupd(x, f, y) : obj(A, B) :: T] 
(ws_fupd) 
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4 Examples 


We have attempted several examples in our implementation. Initially, we followed 
the development in [2] and introduced some abbreviations to make our programs 
more succinct. Furthermore we derived, using the existing axioms, theorems (or 
equivalently, derivable rules) for these syntactic abbreviations. 

For example, in the pure language, field selection strictly has the form x.f 
where x is a variable. We introduce an abbreviation so that for an arbitrary 
program a, the program a.f is an abbreviation for let r=a in x.f, for x not free 
in a. This is encoded into our formal system by defining 


fsel’(a, f) “' let(a, Az.fsel(x, f)) . 


We shall simply overload our existing notation and write fsel for fsel’. Crucially, 
we then prove the following theorem. 


+ VaVfVAVT",T. 
[a : obj((f=A], [)) = T] > 
(vVr.Va,¢.Valloc, alloc. 
T'(r, 0,6, alloc, alloc) = 
FF, 6, alloc.T(¥, 5,6, alloc, alloc) A Res(6(7, f),7, 6,6, alloc, alloc)) D 


ffsel(a, f): As: TY) 


(Note the use of the existential quantification in T’ to account for the interme- 
diate store and result of evaluating a.) This theorem allows us to directly derive 
judgments about programs using fsel without expanding its definition. 

We continue to extend and overload the remaining program constructors 
and derive corresponding “higher-level” rules. Using these rules, we successfully 
prove the examples given in Sec. 4.1-2 of [2]. We then derive an easier-to-use let 
rule before attempting two larger examples: the greatest common divisor pro- 
gram (Sec. 4.3 in [2]), and an original example based on the dining philosophers 
scenario. We consider these two examples in more detail after the new let rule. 


4.1 Reversible Let Rule 


As mentioned in Sec. 2 the let rule is not reversible. In particular, whenever 
we apply the let rule, we must decide on what information to lose. Since most 
programs use the let constructor extensively, this quickly becomes cumbersome. 

Using our implementation, we can derive the following substitute for the let 
rule. 


Va. Vb.VA'NT" VANT NU. 


fa: A:T] D 
(Vx.[a: A] D [b(x): A’: U(a))] D (wsq_let) 
(T;U=T")>D 


llet(a,b): Bu: T” 


This rule does not lose any information. In proof derivations, it is particularly 
useful because information loss can be postponed until later using an explicit 
application of the subsumption axiom. 
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4.2 Greatest Common Divisor 


The ged program from [2], can be written using notation closer to that of popular 
OO languages, as follows. It is a simple exercise to translate this program into 
our formal language. 


calc_gcd ey dy. 


if (y.f <y-9) {yg = y-9 —y-fiy-m(} 
else if (yg <y.f) {y.f =y-f —y.g;y-m(0} 
else {y.f} 


ged obj([f=nat(1), g=nat(1)], [m=calc_ged]) 


This program creates an object with one method m, such that if the fields have 
nonzero values a and 6, invoking m will reduce both fields to the gcd of a and 
b. This is the intuition behind the formal specification given in [2]. To prove the 
formal specification statement in our logic, we strengthen the transition relation 
given in [2]. We can then prove that gcd satisfies the stronger specification. 
The subsumption axiom can be used to return to the original statement. We 
introduce the constant ged : EE + EE > EE and add axioms consistent with its 
interpretation as the gcd function over natural numbers. And so we define 


Usgcaly) act dr.Ad, G.Aalloc, alloc. 
(1 <o(y, f) Al < d(y,g)) D 
r=6(y,f)Ar=4(y,9) A 
CS gcd(a(y, f), o(y,g)) A 
1<o(y, f) \1< o(y,9) 


def : 
Specgca = obj([f = nat, g = nat], [m = (nat, Ugca)]) . 
Using these definitions, we can prove 
F [gcd : Specgeg :: Tiriv] 


where Ti,iy is a trivial transition relation. 


4.3 Dining Philosophers 


Object oriented languages have shown to be particularly suitable for writing 
simulations. In the next example, we consider a simulation in an OO language for 
a formalisation of the dining philosophers scenario. The formalisation we choose 
is based on that presented in Roscoe’s book [9], where a general description of 
the scenario can be found. Our implementation follows Roscoe’s observation that 
the important events that we should model are when the forks get picked up and 
put, down. To make the example more managable, we only consider the case for 
three philosophers at the table. 

We simulate the scenario by creating an object for each fork, and an object 
for each philosopher. The philophers interact with the forks by invoking their 
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methods. In our example, two of the philosphers pick up their forks “left then 
right” and one picks up his forks “right then left.” The resulting system is 
known not to deadlock. We prove this using a suitable formalisation of “does 
not deadlock.” 


Here is code to create a fork object. 


Fork “2 obj([on_table=true], 
[try_pick.up=As.if (s.on_table) {s.on_table = false; true} 
else {false} 
put_down=As.s.on_table = true; false]) 


A philosopher object invokes the try_pick_up method to pick up a fork. The 
method returns true after updating the fork object’s state if this is possible. It 
returns false if the fork is not on the table. 

We introduce the following definitions for creating the two types of philoso- 
phers. 


phil_tick © \s.if (s.n-forks == 0 and s.hungry) { 
if (s.forkl.try_pick_up()) {s.n_forks = 1; false} 
else {false} 
else if (s.n_forks == 1 and s.hungry) { 
if (s.fork2.try_pick_up()) 
{s.n_forks = 2; s.hungry = false; false} 
else {false} 
else if (s.n_forks == 2) { 
s.fork2.put_down(); s.n_forks = 1; false 
} else { 
s.fork1.put_down(); s.n_forks = 0; s.hungry = true; false 
} 
def 


LRPhil = Afork,, fork,.Phil(fork;, fork, ) 
RLPhil © \forky, fork,.Phil(fork,, fork.) 


Phil 2 Aforky, forky. 
obj([hungry=true, n_forks=nat(0), 
forkl=var( fork; ), fork2=var(fork2)), 
[tick=phil_tick]) 


a 


we 


A philospher has four internal states: (1) he is hungry and is holding no forks; 
(2) he is hungry and he is holding one fork; (3) he is no longer hungry! and is 
holding two forks; and (4) he is not hungry and holding one fork. Each state 
transition corresponds exactly to a fork being either picked up or put down. 


1 Here we assume that the philosopher instantaneously eats as soon as he picks up the 
second fork and so is no longer hungry. The point is that the event corresponding to 
a philosopher eating is not important with respect to deadlock considerations. 
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Finally, we put the whole system together as follows by creating a “table” 
object. 


Table “' let(fork, = Fork, forky = Fork, forks = Fork, 

phil, = LRPhil(fork,, fork), 

phil, = LRPhil(forko, forks), 

phils = RLPhil( forks, fork,), 

obj( 
[fl=fork, , f2=fork2, f3=forks, 
pl=phil, , p2=phil,, p3=phils], 
[tickl=As.s.p1.tick(), 
tick2=As.s.p2.tick(), 
tick3=As.s.p3.tick()])) 


The table should be considered as a “black box” with three buttons, one for 
each of the tick methods. To complete the simulation, one must compose this 
program with another program that plays out the possible traces of the system. 

With the dining philosopher scenario simulated by these code fragments, we 
can prove that this system does not deadlock, which we will now formalise. We 
say that a philosopher is blocked whenever he needs to pick up a fork to perform 
a state transition but cannot (exactly when the fork in question is not on the 
table.) The system is deadlocked precisely when all the philosophers on the table 
are blocked. 

Given the store o, we can determine whether any particular philosopher is 
blocked by inspecting the values of the fields of the philosopher and its forks. 
To assist our intuitions, we define the following predicates. “Philosopher p is 
holding fork fork’, 


is-holding ae Ap.Afork.Ao. o(p, nforks) = 1 A fork = o(p, fork1) V 
o(p, n.forks) = 2 A fork = o(p, fork1) V 
o(p,nforks) = 2 A fork = o(p, fork2) 


“Philosopher p is waiting for fork fork,” 


waiting_for act Ap.Afork.ro. 
(a(p, n_forks) = 0 A o(o(p, fork1), on_table) = ff 
A fork = o(p, fork1)) 
V (a(p,n-forks) = 1 A o(o(p, fork2), on_table) = ff 
A fork = a(p, fork2)) . 


Assuming that F(t) is the set of forks, and P(t) the set of philosophers on table 
t, for p € P(t), the predicate “philosopher p is blocked,” 


blocked At.Ap.Ao.d (1) fork.waiting_for(p, fork, a) 
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and “all philosophers are blocked,” 


all_blocked = \t.A0.Vpi:yp.blocked(p,o) 


One way to prove that our system does not deadlock is to use the following 
fact. Let < be a total order such that fork, < fork, ~< forkg. It is the case that 
the order in which the philosophers pick up their forks respects <. It is then 
straightforward to prove that the system does not deadlock?. 

For table ¢, order relation orel and store o, if we define InvTable by 


InvTable(t, orel, 7) © Vpyyp.InvPhil(p, orel, 0) A 

(Vey f-o(f, ontable) = ff D 
Apit)p-is-holding(p, f,o)) A 
f_p_relationship A f_distinct A p_distinct 


where f_p_relationship states that the fork fields of the philosophers point to 
the intended forks, f_distinct and p-_distinct state that the fork and philosopher 
objects are pairwise distinct and 


InvPhil(p, orel, a)  orel (a(p, fork1), o(p, fork2)) A 
(stateo(p) V state;(p) V stateg(p) V states (p)) 


where each state;(p) states the values of the n_forks and hungry fields of philoso- 
pher p at the corresponding state. It follows that if we define 


Spectable <= obj( [fl=Specrork, f2=Specrork, f3=Specrork, 
pl=Specphil, p2=Specpni, P3=Specphil], 
[tickl=TRtick, tick2=TRitick, tick3=T Rtick]) 
and 
TRiick “2 As, 7.Ad, 6-Aalloc, alloc.InvTable(s,&) D InvTable(s, 5) 
we can prove in our logic, 
+ [Table : Spectable 2: Ar-Ad, &.Aalloc, alloc.InvTable(r, <,6)] 


That is, InvTable is an invariant of the system. It is an invariant in the sense that 
it holds immediately after the table object is created, and it is invariant with 
respect to the actions of the three “buttons.” Of course, Specror, and Specppii 
are specifications that are strong enough to describe the behaviour of fork and 
philosopher objects respectively. 

Furthermore, we can prove, for table t, philosopher p, forks fork, fork’ and 
store o, 


blocked(p, 7) D is_holding(p, fork, a) D> o(p, fork1) = fork (1) 
is_holding(p, fork, o) > waiting_for(p, fork’,a) D o(p, fork2) = fork (2) 
InvTable(t, orel, 7) D Vee) fork. 3) 

a(fork, on_table) = ££ D Ipyyp.is_holding(p, fork, c) ( 


? This is in fact a special case of Roscoe’s rules for avoiding deadlock in [9]. 
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and 


InvTable(t, orel,7) D Vpqyp. orel (a(p, fork1), o(p, fork2)) . (4) 
Assuming this, it is straight forward to prove the corollary 
InvTable(t, orel, 7) D ~all_blocked(t,c) , (5) 


as required. 


5 Soundness 


Similar to AL we have the following soundness property. 

Theorem 1. Assume that 0,01 a~ v,0' is provable. For boolean b, if 
+ (Mat: bool:: Ar.AG, 6.Aalloc, alloc.r = b] 

then v = b. 


We prove this, or rather an appropriate generalisation involving open programs 
and assumptions of the form [z; : A:] where x; are free program variables, by in- 
duction on the operational semantics, along the lines of the soundness proof in [2]. 
Complication arises from the fact that the predicates [- : —] and [—:-—::-] 
can appear in the transition relations, as can the constants for program con- 
structions and specifications. We overcome this by assigning trivial meanings to 
these when they appear in transition relations. For example, [— : — :: —] can be 
interpreted as constant true. The details will appear in an expanded version of 
this article.? 


6 Conclusions and Related Work 


Our implementation differs from other work not only because we use HOAS 
but also because we embed the logic in the metalogic directly. Primarily for the 
purpose of “language analysis,” Nipkow et al.[4] have encoded an OO language 
in Isabelle/HOL([5] using a “deep embedding” for the syntax. Similarly, Honsell 
in [3] encodes the syntax of Dynamic Logic in a first-order style. Such encodings 
allow justification arguments to be given by induction over the syntax. In [7], 
Reddy presents an OO, Algol-like language [At based on Reynold’s Idealized 
Algol[8] and its specification logic. Language [At uses HOAS and its specification 
logic is higher-order but programs can only create objects on the stack, not in 
the heap. In contrast, the language of AL creates objects in the heap (global 
store). 

Our design decision to use HOAS has allowed us to quickly and succinctly im- 
plement a verification system since we inherit scoping rules and alpha conversion 


3 The originally submitted version of this article indicated a semantical soundness 
proof. In the meantime, we have realised that this more direct approach is possible. 
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from the metalanguage. Directly embedding the logical connectives results in a 
system that can take full advantage of the features of the underlying theorem 
prover. 

The use of a theorem prover has shown to be invaluable for keeping track 
of the many assumptions that occur during the verification of nontrivial exam- 
ples. However, in practice, it is still difficult to verify, using LEGO, even small 
examples such as those presented in this article. One often gets too involved 
trying to prove “trivial” subgoals. Since our implementation does not use any 
specific features of LEGO nor constructive type theory, we may find that using 
a theorem prover with more automation, such as PVS or Isabelle/HOL, would 
result in a more usable verification tool. 
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A Strong and Mechanizable Grand Logic* 


M. Randall Holmes 


Boise State University 


Abstract. The purpose of this paper is to describe a “grand logic”, 
that is, a system of higher order logic capable of use as a general pur- 
pose foundation for mathematics. This logic has developed as the logic 
of a theorem proving system which has had a number of names in its 
career (EFTTP, Mark2, and currently Watson), and the suitability of 
this logic for computer-assisted formal proof is an aspect which will be 
considered, though not thoroughly. A distinguishing feature of this sy- 
stem is its relationship to Quine’s set theory NF and related untyped 
d-calculi studied by the author. 


1 Introduction 


The theory we develop here will be referred to as W, after the current name 
“Watson” of the theorem prover in which it is implemented (for a more through 
discussion of this prover see [8]). The notation of the system will be presented 
just as it is presented to (and by) the theorem prover. 

The roots of this logical system are in Quine’s set theory “New Foundations” 
(NF) of [10], but it cannot be described simply as an implementation of NF. 
NF is not known to be consistent; the grand logic presented here is (partly) 
based on the variation NFU of NF presented by Jensen in [9], which is known 
to be consistent and suitable for applications (see [7] for a development). NF 
and NFU are set theories; W is an untyped A-calculus. NF and NFU are usually 
presented using standard first-order logic; this system interprets the notions of 
propositional and predicate logic in terms of its own rather different primitives. 


2 Syntax 


The formal theory W presented here is an equational theory. All statements of W 
are equations between terms (intended for use as rewrite rules) and the focus of 
the theory is on the structure of terms rather than on the structure of proposi- 
tions. Terms representing truth values stand in for propositions, and the usual 
notions of propositional and predicate logic are expressed as operations on terms 
representing truth values. 

This section is devoted to the syntax of the language of W. First of all, if A 
and B are terms, A = B is an equation (as statement of W); but the = operator 


* The author gratefully acknowledges the support of US Army Research Office grant 
DAAG55-98-1-0263 
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also occurs as a term constructor with the natural meaning (A = B is a term 
which is equal to true if A = B holds and equal to false if A = B is false). The 
overloading of = should always be easily disambiguated in what follows. 

Any string of positive length consisting of characters taken from the sets of 
letters, digits, and the special characters ? and _ is an atomic term. 

Atomic terms are of four kinds: 


numerals: Any atomic term consisting only of digits is a numeral. (This ca- 
tegory may be regarded as subsumed under “constants” below: it is not of 
special logical interest). 

bound variables: An atomic term consisting of ? followed by a non-zero-initial 
numeral is a bound variable. 

free variables: An atomic term beginning with ? and containing another non- 
numeric character is a free variable. 

constants: An atomic term not beginning with ? and containing a non-numeric 
character is a constant. 


Before constructions of composite terms are introduced, a preliminary dis- 
cussion of kinds of operator is needed. 


operators: A string of special characters (not listed, but excluding all charac- 
ters found in atomic constants and excluding paired forms such as quotes, 
braces, brackets, and parentheses) is an operator. In particular, @ and @! are 
operators representing two different kinds of function application, and , is 
the ordered pair constructor. Also, a string of alphanumerics preceded by a 
backquote ‘ is an operator. 
It is important to note that an operator is not itself a term. We oversimplify 
by stipulating that each operator is either prefix or infix, but not both (there 
is some overloading in the prover). (Operators are declared infix or prefix in 
particular theories.) 


We now present the constructions of composite terms. 


prefix terms: A prefix term consists of a (prefix) operator followed by a term. 

abstraction terms: An abstraction term (a function) consists of a term enc- 
losed in brackets. (Abstraction terms implement A-terms, and standard 4- 
notation will sometimes be used). 

parenthesized term: A term enclosed in parentheses. 

infix terms: An infix term consists of a atomic, abstraction, or parenthesized 
term, followed by an (infix) operator, followed by a term. 

case expressions: A case expression consists of a parenthesized term, followed 
by ||, followed by a parenthesized term, followed by ,, followed by a term. 
The special operator || may only occur in terms of this form. 

reduction of parentheses: Parentheses around an atomic term or abstraction 
term may always be removed. Parentheses around an infix term or case 
expression may be removed except when it is the left subterm of an infix 
term or one of the two leftmost subterms of a case expression. If a term is 
obtained from another term by reduction of parentheses (or by the addition 
of parentheses for clarity), it is regarded as being the same term. 


A Strong and Mechanizable Grand Logic 285 


completeness of description: The class of terms is the intersection of all sets 
containing all atomic terms and closed under the term constructions given 
above. 


This description of the syntax is based on the default order of precedence of 
the Watson prover, in which all operators have the same precedence and group 
to the right. 


3 Equational Logic 


The bedrock of the logic of W is equational logic. All statements in the language of 
W are equations, understood to be implicitly universally quantified over the free 
variables occurring in them. All free variables are untyped, with an exception 
described below (in the discussion of class abstraction). 

In this section we restrict ourselves to the sublanguage of the language of 
W which excludes bound variables. (Abstraction terms without bound variables 
may occur; these represent constant functions.) 

We define substitution for the restricted language without bound variables: if 
A, T are terms and ?x is any variable, we define A{T/7x} as the result of replacing 
all occurrences of the variable 7x with (T). (Of course, the parentheses may then 
often be reduced away). 

The basic rules of the equational logic of W are as follows: 


reflexivity: For any term A, A = A is a theorem. 

symmetry: If A = B is a theorem, then B = A is a theorem. 

transitivity: If A = B is a theorem and B = C is a theorem, then A = Cis a 
theorem. 


localization: If A = B is a theorem and C is a term then C{A/?x} = C{B/?x} 
is a theorem. (7x being any free variable). 
specification: If A = B is a theorem and C is a term then A{C/?x} = B{C/?x} 


is a theorem. (?x being any free variable). 


These rules will need to be refined when bound variables are introduced. 


4 The Logic of Terms Defined by Cases 


We now consider the first part of the grand logic W, corresponding to propositio- 
nal logic and the logic of identity. 

We introduce the predeclared constants true and false, representing the 
truth values. 

In a case expression T || U , V, we refer to the subterm T as the hypothesis 
of the case expression and to the subterms U and V as its branches. The intended 
meaning of the term (T = U) || V , Wis “if T = U then V else W’; when T is 
not an equation, T || U, V is intended to have the same meaning as (true = 
T) (fu, V. 
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We introduce axioms governing the behavior of the special term construction 
of “case expressions” . 
The basic axioms are the following: 


Pl: ((?x = ?x) || ?y , ?z) = ?y 

P2: ((true = false) || ?y , ?z) = ?w 

HYP: ((?a = ?b) [| (A{?a/?x}) , B) = 
((?7a = ?b) || (A{?b/?x}) , B) 

DIST: (A{((?a = ?b) || ?¢ , 7d)/?x}) = 
(7a = 7b) || A{?c/?x} , A{7d/?x} 


The axioms P1 and P2 implement special cases of our preformal understan- 
ding that a case expression will be equal to its first branch when the hypothesis is 
true and to its second branch when the hypothesis is false. In an expression (A = 
B) || T , U, it should be clear that we can freely replace A with B or vice versa 
in the context T without affecting the value of the term: this is captured by the 
axiom HYP. The name is taken from the idea that this implements “reasoning 
under hypotheses”. The axiom DIST allows the “distribution” of the hypothesis 
of a subterm over a larger context. 

It should be noted that HYP and DIST are axiom schemes rather than 
single axioms (thanks to a referee for pointing out that I needed to say this!) 
They could in principle be replaced in almost all applications by single axioms 
of the form 


*HYP: ((?a = ?b) || (7A @! 7a) , 7B) = 
((?7a = ?b) || (7A @! 7b) , 7B) 

*DIST: (7A @! ((?7a = 7b) || 7c , 7d) = 
(?7a = ?b) {| (7A O! 7c) , CPA Ot 7a) 


where (as noted above) @! is a function application operator. In earlier versions 
of the prover, axioms of the latter forms were actually used; applying such axioms 
to get each instance of the full schemes involved -abstraction and G-reduction. 
In the current version of the prover, there is built-in support for the application 
of the schemes, not involving any use of the function machinery of the prover, 
so it seerns more natural to present the schemes. 

The axiom set actually built into Watson is slightly larger, but we want 
to emphasize the extreme simplicity of the logic of case expressions presented 
here. The additional content provided by Watson can be presented as the pair 
of axioms: 


EQ: (?a = ?b) = (7a = 7b) || true , false 
GH: ((true = ?x) || ?y , ?z) = ?x Il ?y , ?z 


These can be regarded as providing implicit definitions of terms with the 
operator = and of case expressions with hypotheses which are not equations. 
Though these are useful constructions, it is worth noting that they are not an 
essential part of the underlying theory. 
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The following propositions are easy consequences of the six axioms given so 
far: 


El: (?a = 7a) = true 

E2: (true = false) = false 
Bl: (true || ?x , ?y) = ?x 
B2: (false || ?x , ?y) = ?y 


Watson has El and B1-2 as built-in assumptions instead of P1-2; E2 is 
provable from these and EQ, GH, as are P1-2. 

The axiom HYP allows substitutions to be made in the left branch of the 
hypothesis under the locally valid assumption that the hypothesis is true; it 
may seem that we have neglected similar things one can do in the right branch 
using the assumption that the hypothesis is false, but this is not the case! We 
present three theorems, the last of which captures the use of negative hypotheses: 


TO: ((?a = 7b). || ?x_, ?x) = ?x 

Tl: ((?a = 7b) || A{((?a = 7b) |] ?x, 7y)/?v} , ?z) 
(7a = ?b) || A{?x/?v} , ?z 

T2: ((?a = ?b) [| ?x, A{((?a = ?b) || ?y , 7z)/?v}) 
(7a = 7b) [| ?x , A{?z/?v} 


While the axiom HYP allows us to rewrite only in the left branch of a case 
expression, the theorems T1 and T2 allow rewriting of into both the left and 
right branches of case expressions, though of a more restricted nature. 

We omit the easy proofs of TO and T1. We do prove T2. 


(7a = ?b) || ?x, A{C((?7a = ?b) Il ?7y , 7z)/?v} = (EQ) 


((?a = ?b) || true , false) || ?x , 
A{(((?a = ?b) || true , false) || ?y , ?z)/?v} = (substitution) 


(?7u || ?x , A{C?u [Ll ?y , ?2)/?v}) 
{((?a = ?b) || true , false)/?u} = (DIST) 


(?7a = ?b) I 
((?u || ?x , A{(?u || ?y , ?2)/?v}){true/?u}), 
(?u || ?x , A{C?u ||] ?y , ?z)/?v}){false/?u} = (substitution) 


(7a = ?b) || 

(true || ?x , A{(true || ?y , ?z)/?v}) , 

(false || ?x , A{(false || ?y , ?z)/?v}) = (Bi and B2) 
(?a = ?b) || ?x , A{(false || ?y , ?z)/?v} = (B2) 


(7a = ?b) || ?x , A{?z/?v} 


which completes the proof of T2. 
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Though we regard the formulation using DIST and HYP as more mathe- 
matically elegant, it should be noted that taking HYP and TO-2 as primitive 
assumptions is equivalent, and this is the axiomatization which is effectively 
hard-wired into the prover. We omit the short proof of DIST from T0-2. 

Propositional connectives are readily defined using expressions defined by 
cases. We give only the definition of negation as an example. 


Definition: ~T is defined as T || false, true 


It is worth remarking that in case T is neither true nor false, this definition 
treats it in the same way as false (and so ~T is equal to true in this case). 
Similar considerations apply to the other propositional connectives. 

We now prove a completeness theorem for the logic of case expressions in its 
intended interpretation. 

A theory in the language of W is an set of equations between terms in the 
language of W (with some fixed set of constants and constant operators), closed 
under the application of the rules of equational logic (reflexivity, symmetry, 
transitivity, localization and specification). 


Definition: [f M is a set with more than one element, an environment for M 
relative to a theory is a map from the free variables of the language of W to 
elements of M. 

An interpretation of the theory M is a map which takes any pair consisting 
of an environment for M and a term to an element of M. 
An interpretation J is said to be sound for a theory if the following conditions 
hold: 
1. If o is an environment and v is a free variable, then I(o,v) = a(v). 
2. If t and u are terms such that ¢ = u is an equation in the theory, and 
is any environment, then I(o,t) = I(¢, u). 
3. I(o, true) # I(o, false) for any oc. 
4. If t is a term containing no free variables, then I(o,t) = I(o’,t) for any 
environments o and o’. 


We now prove a 


Completeness Theorem: Any theory which does not have true = false as 
an element has a sound interpretation in a set M which is at. most countably 
infinite. 

Proof: We construct an interpretation whose range is the set of equivalence 
classes of variable-free terms of the language of the theory under a suitable 
equivalence relation. 

Terms T and U will be equivalent if they are provably equal in the theory. 
This is not sufficient to define the desired equivalence relation, because the 
theory may not be complete (it may not allow the decision of all equations). 
To define the complete theory, enumerate all equations between variable- 
free terms in its language. Consider the first equation T = U on this list 
with the property that neither T = Unor ((T = U)||true,false) = false 
is a theorem. It is straightforward to establish that T = U is a theorem 
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iff ((T = U) || true , false) = true is a theorem, and so that ((T = 
U)||true,false) = false cannot both be theorems (because symmetry 
and transitivity of equality would force true = false to be a theorem, con- 
trary to hypothesis). 

We claim that adding T = U to the theory and closing under the application 
of the rules of equational logic will still produce a consistent theory (true = 
false will not belong to the extended theory). Suppose otherwise: then we 
would have a proof 


true = Vi = ... = Vn = false 


with each step justified by an element of our theory or the equation T = U 
(possibly combined with an application of the rule of localization). 
We could then modify this proof to the following form: 


(T=U) |[true,false = 
(T=U)||Vi,false = 


(T=U) | |Vn,false = 
(T=U) ||false,false = (TO) 
false 


Each step of this proof would be valid in the original theory: steps using 
equations of the original theory obviously remain valid and the steps using 
T = U would be justified by applications of the axiom HYP of the logic of 
case expressions. So the original theory would prove ((T=U) | |true, false) 
= false, which we have seen is impossible. 

We can then repeat this process to obtain a complete theory (one which 
decides every equation). The resulting complete theory allows us to define a 
sound interpretation J in the set of equivalence classes of variable-free terms 
of the language; I(o,t) will be the equivalence class of the term obtained 
from the term ¢ by replacing each free variable v occurring in ¢ with some 
element of the equivalence class of terms o(v). Since the language itself is 
no more than countably infinite, any partition of the set of terms of the 
language is likewise no more than countably infinite. The proof is complete. 


It follows from the Completeness Theorem and the definability of notions 
of propositional logic and the logic of identity in the logic of case expressions 
given here that this logic codes all valid reasoning in propositional logic and the 
logic of identity (as claimed above). Complete implementations of propositional 
logic reasoning in several styles have been made in Watson, using the principles 
described here. 

The definition of another version of this logic of case expressions and the 
proof of its completeness are found in our unpublished [4]. 

We are well aware that the use of an if...then...else... construction as a 
primitive in the definition of logical connectives is not novel. We have not seen it 
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widely advertised that the four basic axioms (which we repeat here in standard 
notation for emphasis) together with rules of equational logic are sufficient to 
provide a basis for propositional logic and the logic of identity in an untyped 
context. 


Pl: (if z = athenyelsez) =y 

P2: (if true = false thenyelse z) = z 

HYP: (if a = bthen F(a) elsec) = (if a = bthen Fb) elsec) 
DIST: F(if a = bthencelsed) = (if a = bthen F(c) else F'(d)) 


5 Bound Variables and Substitution 


We introduce the notation for variable binding used in the prover, which is a 
system of the sort introduced by de Bruijn (in [2]) with “nameless dummies”, 
though it is not the usual scheme of “de Bruijn indices”. We also introduce 
the formal definition of substitution for this system and extend the rules of 
equational logic to the language as extended with bound variables. 

The construction of functions is the only variable binding construction in 
the logic of Watson. There are two different kinds of function application, set 
function application, represented by the infix operator @, and class function 
application, represented by the infix operator ¢!. The same variable binding 
construction builds both set and class functions; there is a syntactical constraint 
on permitted occurrences of @! in functions, but no corresponding restriction 
on permitted occurrences of ©. The application of the $-reduction rule is more 
restricted for the © operator than for @!, as will be discussed in the next section. 

We recall that an atomic term consisting of ? followed by a non-zero-initial 
numeral is a bound variable. 

An occurrence of a term in Watson is said to have “level n” if it occurs as 
a subterm of n abstraction terms (if it is enclosed in n pairs of brackets, on 
a typographical level). The bound variable ?n cannot occur sensibly at a level 
lower than n (where the two occurrences of “n” in different type faces represent a 
positive integer and its numeral). The intended semantics is that an abstraction 
term [T] occurring at level n — 1 represents a A-term (a function) in which 
the bound variable is ?n: so for example the term [71] at level 0 stands for the 
function (Ag.z): the term [[?1]] (at level 0) is (Az.(Ay.x)), the map which sends 
x to the constant function of x (the K combinator) while [[?2]] is (Av.{Ay.y)), 
the constant function whose value is the identity function. The term [[?73]] has 
no semantics at level 0 except as a subterm of a larger term; we cannot see the 
bracket that binds the bound variable 73. 

We formally qualify the notion of term to facilitate discussion of subterms 
of nontrivial level: a “level n term” is a term in which each bound variable ?m 
occurs enclosed in at least m — n brackets. Note that any level n term is also 
a term of level m for each m > n, though the semantics of typographically 
identical terms may be different at different levels. Level 0 terms are the terms 
which have sensible semantics in a top-level term; level n terms are those terms 
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which can appear in a level 0 term inside n enclosing brackets. In a term being 
considered as a level n term, we will speak of subterms enclosed in m brackets 
as occurring at level m+ n (tacitly assuming that there are n more brackets 
somewhere in a larger context). 


This scheme is closer to the usual variable binding scheme than the familiar 
scheme of “deBruijn indices” (we have seen the scheme we use referred to as 
“deBruijn levels”): a term in the usual \-calculus can be converted to this form 
by renaming the outermost bound variables to 71, the next-to-outermost bound 
variables to 72, etc., then replacing all the binders (Az. ...) with brackets. An ad- 
vantage of this scheme over deBruijn indices is that instances of the same bound 
variable always look the same; a disadvantage is that terms with bound variables 
will have to have the bound variables renumbered when they are substituted into 
a context at a different level. 


Practical experience with using this system suggests that as long as brackets 
are not too deeply nested the notation is intelligible. In the current Watson 
theory package, an operator . is provided with the defining axiom (?x.?y) = 
?y (ignore the first argument); a tactic is provided which converts every bracket 
term [T] in the current context to the form [?n.T], where ?n is the appropriate 
bound variable. There is a converse tactic to get rid of such annotations. The 
development of a tactic of this kind under Watson is easy, and it restores the 
advantages of the usual variable binding notation with a binder at the head 
of the term (if one doesn’t mind having the names of one’s bound variables 
chosen for one). Note that the introduction and removal of such annotations is 
automated; if a user introduces incorrect annotations by hand, they are easily 
checked and corrected; it is exactly the fact that the semantics of the annotations 
are trivial which makes it possible for prover tactics (which are not allowed to 
change a term in a way which affects its reference) to correct the annotations 
where necessary. 


On a formal level, the introduction of variable binding requires a change in 
the definition of substitution and an extension of the rules of equational logic. 


If A and B are terms of the same level n and ?x is a free variable, we define a 
term A{B/?x} of the same level n. Where m and n are numerals with m > n and 
B is a level n term, we define B{m/n} as the term which results when each bound 
variable 7i in B with index i > n is replaced by ?j with index j =i+m-—n. 
(A variable which is bound by a bracket in B (a 7i with i > 7) will no longer 
be associated with the correct bracket if the term B is substituted into a context 
enclosed in m brackets instead of n brackets: it will need to have its numbering 
shifted by m — n.) The refinement in the definition of A{B/?x} is that each 
occurrence of ?x needs to be replaced with (B{m/n}) rather than simply (B), 
where m is the level of that occurrence of ?x. 


The definition of A{B/?x} can be extended to the case where the level n of B 
is greater than the level J of A, just in case no occurrence of the variable ?x in 
A is at a level less than n. The form of the definition is exactly the same in this 
case; this extension is needed for the formalization of the rule of localization. 
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We present extended axioms of equational logic for W, defining notions of 
theorem at each level n. Of course, our true theorems are the level 0 theorems. 


reflexivity: For any level n term A, A = A is a level n theorem. 

symmetry: If A = Bisa level n theorem, then B = A is a level n theorem. 

transitivity: If A = Bisa level n theorem and B = C isa level n theorem, then 
A = Cis a level n theorem. 

localization: If A = B is a level n theorem and C is a level m term in which 
all occurrences of ?x are at level n or higher (so n > m) then C{A/?x} = 
C{B/?x} is a level m theorem. (?x being any free variable). (notice that the 
definition of substitution handles any needed renumbering of bound variables 
in the equation A = B that is applied). 

specification: IfA = Bisa level n theorem and C is a level m term then A{C/?x} 
= B{C/?x} is a level n theorem. (7x being any free variable). 

level conversion: If A = B is a level n theorem and m > n, then A{m/n} = 
B{m/n} is a level m theorem. 


The harmonization of this system with the logic of case expressions given 
above amounts to recognizing that the new definition of substitution needs to 
be used. There is no essential change in the proof of completeness; it goes the 
same way mod renumbering of bound variables in the equation T = U mentioned 
in that proof when used at different levels. 

We now develop the defining axiom of the class map application operator @!. 
Where ?n is a bound variable, T is a level n term and U is a level n — 1 term, 
we define T{U/?n} as the level n — 1 term which results if all occurrences of ?n 
in T are replaced by (U) and all occurrences of bound variables 7i with 7 > n in 
T are replaced by ?j with 7 = i—1. We can then state the rule of G-reduction 
in the very natural form: 


(class) G-reduction: ([T] @! U) = T{U/?n} is a level n —1 theorem for each 
level n term T and level n — 1 term U. 


Another form in which this could be stated (without introducing new nota- 
tion, but that is its only merit!) is “([T{n/(n-1)}{?n/?x}] @! U) = T{U/?x} 
is a level n — 1 theorem for any level n — 1 terms T and U”. 

So far we appear to have axiomatized untyped \-calculus, which is incompa- 
tible with the presence of functions without fixed points, such as negation, which 
we can already define. Paradox is avoided by a restriction on the formation of 
abstraction terms containing the operator @!. We define the head of a term and 
its number of arguments as follows: if a term T is not of the form U @! V, then 
it is its own head and has 0 arguments; if a term T is of the form U @! V, then 
its head is the head of U and it has one more argument than U. We define an 
n-function as follows: a term not of the form [T] is a 0-function and a term of 
the form [T] is an (n+ 1)-function iff T is an n-function. The restriction on the 
formation of abstraction terms is that if a term with n arguments appears as a 
subterm of an abstraction term, its head must be either an n-function or a free 
variable. Notice that under this condition any heads of subterms of abstraction 


A Strong and Mechanizable Grand Logic 293 


terms which are n-functions for n > 0 can be eliminated by §-reductions. For 
example, this restriction forbids the formation of fixed points of arbitrary ope- 
rators F by self-application of abstraction terms [F @! (71 @! 71)], because 
71 @! ?1 is prevented from occurring as a subterm of an abstraction term. 


6 The Theory of Class Abstraction 


We have seen above that the operations of propositional logic can be interpreted 
in the logic of case expressions. We now use the class function machinery of 
Watson to interpret quantification. 

The essential idea is that the universal quantifier can be interpreted as the 
function forall defined as [71 = [true]] (ie., (Av.(x = (Ay.true))); there is 
nothing new about this idea!). If a formula ¢ is represented by a term T, then 
the formula (Vz.¢) will be represented (mod technicalities about the variable 
binding) by [T] = [true] = forall ©! [T]. Ifthe formula ¢ is represented by 
a term T, the formula (4z.¢) will be represented by the term ~([T] = [false]). 
So we define forsome as [~(?1=[false])]. (the definition of forsome is not 
quite as nice as that of forall, because it does not mean quite what one would 
like when T takes on values which are not truth-values). In any event, forall 
© [T] and forsome @ [T] will code the intended quantified statements when T 
codes a formula (and so has boolean value). 

We now demonstrate that first order logic with equality on infinite domains 
is captured exactly by the logic of case expressions augmented with our scheme 
of class functions. The precise sense in which this is true is as follows: we can take 
any countably infinite model of a first order theory and introduce a definition 
of the class function abstraction and application operations which will satisfy 
the formal rules of this system and under which the internal definitions of the 
quantifiers will succeed. 

Take any first order theory T (with equality) with a finite or countably 
infinite set of primitive predicates, constants and function symbols and having 
a countably infinite model M. We indicate how to represent the machinery of 
class abstraction within M in such a way that the definitions of the quantifiers 
in terms of class abstraction succeed. 

We partition the countably infinite set M into a sequence of countably infinite 
subsets M; indexed by the natural numbers. 

We need to translate the language of T into the language of W. Select two 
distinct elements of the model M as referents of the terms true and false. 
Propositions of the language of T will be interpreted as terms with values true 
or false. Introduce constants (in the sense of W) translating each constant of T. 
Introduce operators translating each predicate and function symbol of T (equa- 
lity is translated by =). Unary predicates or function symbols will correspond 
to prefix operators, and binary predicates or function symbols will correspond 
to infix operators. If there are operators of ternary or higher arity, these can 
be accomodated by introducing the pair operator (,); for example, an atomic 
sentence with a ternary predicate symbol like Rayz would have the translation 
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x ‘Ry , z. The pair operator can represent any injection from M x M into 


M. 


We define a class Lg of terms in the language of W containing interpretations 
of all terms and quantifier-free propositions of the language of 7’. Lg contains all 
free variables and translated constants of T, plus true and false. It is closed un- 
der the construction of terms with (translated) predicates and function symbols 
of T (plus = and ,) and the construction of case expressions (used to handle pro- 
positional connectives). It is the smallest class of terms satisfying these closure 
conditions. 


For each term T in Lo which contains no free variable other than a fixed 
variable ?x, construct the abstraction term [T{1/0}{?1/?x}] (after this point, 
we abbreviate this as [T{?1/?x}]). These abstraction terms are permitted in 
W, because no such term T will contain any occurrence of @!; this will remain 
true throughout the iterative construction we are about to carry out. Each such 
abstraction term corresponds in a natural way to a function from M to M; we 
assign an element of Mo as the referent of each such abstraction term, assigning 
the same referent to terms which correspond to the same function. This will 
succeed because there are clearly no more than countably infinitely many such 
terms. 


Our intention is now to augment our language by adding all the abstraction 
terms we have just constructed as new constants. We extend the language Lo 
to include all abstractions over terms of Lo (notice that this is not the same as 
taking the closure of Zo under the abstraction term construction!); we call this 
extended language L, (it is clearly harmless to allow free variables in abstraction 
terms; the reference of an abstraction term with free variables in it will be 
determined once the reference of each free variable is determined). Notice that for 
each term T of Lo which codes a proposition ¢, we have [T{71/?x}] = [true], 
which codes (Vz.¢), as a term of £1; the language L allows us to express some 
quantified sentences. 


We then proceed in the same way through steps indexed by the natural 
numbers. When the language L, has been constructed, we consider the set: of 
all terms T of L, which contain no free variable other than ?x. We construct 
abstraction terms [T{?1/?x}] for each such term. Each such abstraction term 
corresponds to a function from M to M. Some of these terms will correspond to 
functions with the same extension as an abstraction term already defined; assign 
these the same referent as the term(s) with which they are coextensional. Assign 
to each term which has a “new” extension a referent in M,, (none of whose 
elements will have been used yet as referents for abstraction terms), assigning 
terms with the same extension the same referent. It will be possible to do this 
because the class of abstraction terms being considered is no more than countably 
infinite. We extend the language Ly, with all abstraction terms [T{?1/?x}] for 
T a term of Ly, obtaining the language L,,1; we are able to determine the 
reference of any term of Ln+1 once the reference of any free variables in the 
term is given. Notice that if any proposition ¢ is coded by a term T of Dy, the 
proposition (Vx.¢) will be coded by the term [T{?1/?x}] = [true] in Ly41. 
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We consider the language L,,, the union of all the languages £,,. We aug- 
ment E,, with the class application operator @!, using the G-reduction scheme 
to determine the meaning of terms [T] ©! U and assigning a default value to 
each term T @! U where the referent of T is not the referent of any abstraction 
term. [,, allows us to abstract freely over terms which do not contain @! (this 
should be clear from the construction); abstraction over terms with a subterm 
with n arguments and head an n-function makes sense because the occurrences 
of @! in such subterms can be eliminated by repeated G-reduction, obtaining 
an abstraction term provided by L,,; abstraction over terms with a subterm 
with n arguments and head a free variable makes sense as long as we stipulate 
that such free variables are implicitly typed as n-functions (the prover enforces 
this). So the abstraction terms allowed by W are all interpretable, since the ab- 
straction terms in L,, are interpretable. Note further that L,,, although we have 
only directly interpreted quantfier-free sentences of T, actually provides indirect 
interpretations for all quantified sentences of T. 

This discussion establishes that the notions of class abstraction can be added 
harmlessly (as a conservative extension) to any first-order theory with an infinite 
model. It further needs to be shown that the deductive machinery of the theory of 
class abstraction is strong enough to recover the usual properties of quantifiers. 
This is best seen by pointing out that both of the usual rules for universal 
quantifiers can be emulated in the theory of class abstraction: 


(forall © [T]) = true premise 

((T] = [true]) = true definition of forall 

{T] = [true] simple case expression reasoning 
({T] © ?x) = [true] © ?x localization 

T{?x/?1} = true beta-reduction on both sides 


demonstrates universal instantiation. 


T = true level 0 theorem (premise) 
T{1/0} = true level 1 theoren, 

by level conversion 
T{i1/O}{71/?x} = true level 1 theorem, by specification 
[T{1/0}{71/?x}] = [true] level O theorem, by localization 
forall © [T{1/0}{?71/?x}] definition of forall 


demonstrates universal generalization. 

Universal generalization and universal instantiation, combined with proposi- 
tional logic which we know we can interpret using the logic of case expressions, 
is enough to verify the rules for the existential quantifier. 

Generalized predicates can be represented by free variables in this system 
(free variables appearing as heads of application terms) but such free variables 
cannot be replaced with bound variables, which prevents the representation of 
quantification over predicates: there is no second-order logic here. 
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7 Higher Order Logic via Stratified Abstraction 


The “A-calculus” described in the previous section is a late innovation in the 
logic of Watson. It is a quite weak system. Watson incorporates another much 
stronger A-calculus, equivalent in strength and expressive power to a safe variant 
of Quine’s set theory “New Foundations”, on which it is based. It is also equi- 
valent in strength to Church’s simply typed \-calculus with an axiom of infinity 
(with refinements introduced below, it is somewhat stronger). It differs from the 
Church system in being untyped. It can be noted here that the entire logic W 
is untyped (except for the implicit typing of free variables appearing as heads 
of curried class application terms). The stronger -calculus is called “stratified 
A-calculus”, and is discussed at length in [5], {6], and [7]. Here our treatment will 
be briefer. 

The stratified A-calculus is best understood initially via a related typed sy- 
stem, a fragment of the simple type theory of Church (see [1}). The fragment 
has types indexed by the natural numbers: type 0 is the type « of individuals 
(of unspecified character), and type n+ 1, for each n is the type (n — n) of 
functions from type n to type n. In addition, type 0 has at least two distinct 
elements and each type satisfies the type identity (n x n) = n: ie., each type 
supports an ordered pair (Watson actually assumes (n x n) C n; surjectivity of 
the pair is not assumed). 

This type system shares a characteristic with Russell’s type theory of sets 
which Church’s simple type theory does not have: all the types look the same in 
a certain sense. If one raises each type index in an axiom of this typed 4-calculus, 
one obtains another axiom; it is easy to see from this fact that the same holds for 
theorems. This suggests that it is reasonable to suppose that the whole structure 
consisting of types 0,1,2... is isomorphic to the structure consisting of types 
1,2,3...-— or even that the type distinctions can be collapsed completely. This 
is the same as the motivation for the modification of Russell’s theory of types 
for sets which gives Quine’s set theory “New Foundations”. 

It turns out that the ability to safely collapse a type theory using polymor- 
phism in this way is sensitive to details of its axiomatization. If one assumes full 
extensionality (that every object is a \-term) it is an open question whether the 
collapse can be carried out (equivalent to the open question of the consistency 
of NF). If one does not assume extensionality, one obtains a theory which is 
known to be consistent and essentially equivalent to Jensen’s variation NFU + 
Infinity of “New Foundations”, which has the same consistency strength and 
expressive power as Russell’s theory of types or Church’s simple theory of types 
(with infinity). This collapsing process is discussed in detail in [6]. 

The theory obtained when the type structure is collapsed is one-sorted — ob- 
jects of the theory are not typed — but a notion of “relative type” still plays an 
important role in the theory. The point is that when the type distinctions are 
collapsed one still has only those instances of the scheme of G-reduction [T]©U 
= T{U/?1} which make sense in terms of the type scheme. This is vitally impor- 
tant: one does not want to acquire instances of G-reduction like [~?10?71]0?x 
= ~7xQ?x, from which, if one defines R as [~?1071], one can deduce the di- 
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sastrous theorem ROR = ~ROR. But the abstraction term [~?71071] is in some 
sense illicit, because there is no way to type it sensibly in terms of the typed 
system described above. 

We now describe the way that these ideas are implemented in the logic W of 
Watson. Each operator is supplied with “relative types” for its arguments (called 
the left type and right type): if an operator % (infix for the sake of the example) 
has left type i and right type 7, this tells us that if a term T % U were of type n 
in the typed system, then T would be type n +7 and U would be type n+ 7. For 
example, the function application operator © has left type 1 and right type 0, 
because a type n+1 function is applied to a type n+ 0 argument to get a type n 
term. It should be noted that negative relative types are possible: for example, 
a singleton set operator would have type —1 for its sole argument. 

There is an additional option: some operators (such as the class application 
operator @!) are “opaque”; abstraction into an opaque context is not allowed in 
stratified abstraction terms. 

The machinery of relative types is used to identify function abstracts which 
are allowed in the function abstraction scheme for the application operator ©. 
Such abstraction terms are said to be “stratified” by analogy with terminology 
used in “New Foundations” and related set theories for formulas permitted in 
set abstracts. 

Formal definitions of relative type and stratification follow: 


Definition: Occurrences of subterms of a term (with exceptions in opaque con- 
texts) are said to have “relative type” in that term. Relative type is defined 
recursively: 


1. The relative type of a term in itself is 0. 

2. If the relative type of an occurrence of the term A in a term T is n, and 
the left (resp. right) type of the operator % is 7, then the relative type 
of the analogous occurrence of A in the obvious occurrence of T in T % 
U (resp U % T or % T (in the case of a unary operator)) is n + i. If % is 
opaque, then the relative type of the analogous occurrence of A in the 
obvious occurrence of T in any of these terms is undefined. 

3. If the relative type of an occurrence of the term A in a term T is n, then 
the relative type of the analogous occurrence of A in the occurrence of T 
in [T] isn—1. 

4. The relative type of an occurrence of AinT || U , Vis the same as the 
relative type of its occurrence in the appropriate one of T, U, V. 


Definition: An abstraction term [T] is “stratified” if the relative type in T 
of each occurrence (there need not be any occurrences) of the variable 7n 
bound by the brackets is defined and equal to 0, and if each abstraction term 
appearing as a proper subterm of [T] is stratified. 

Axiom scheme (stratified 6-reduction): [T]QU = T{U/?7n} is a level n 
axiom, when [T] is a stratified abstraction term. 


Note that a term like [~71071] is actually well-formed, and we do have 
([~?71071] @! U) = ~UOQU, but we do not have the disastrous ([~?71071] @ UW) 
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= ~UQU, because [~?71071] is not stratified. Notice also that because the class 
application operator @! is “opaque”, one cannot define functions in the higher 
order logic which depend in any nontrivial way on facts about class application. 

An important note here is that it might be thought to be dangerous that 
we have made set function application and class function application coincide 
for stratified abstraction terms. This turns out not to be a problem as long 
as one provides enough non-functions (terms T not equal to [T @ 71]). The 
only real curiosity here is that one can prove that set function application is 
nonextensional by considering class abstractions like [~?1 @ 71] which would 
be paradoxical if they were also set abstractions. 


8 Experience with Watson 


It is possible to develop the theory of quantification using the machinery of 
the stratified A-calculus alone (and this is how it was done originally). If the 
class application operator were not used, unstratified abstraction terms would 
be treated as ill-formed (there is a current release of Watson which still takes this 
approach). The representation of (Vz.¢) as forall @ [T] and the verification 
of instantiation and generalization would still work, with the restriction that we 
would only consider formulas represented by stratified terms. It is known that all 
stratified theorems of systems like “New Foundations” or NFU have proofs which 
involve only stratified sentences, and most sentences of mathematical interest are 
stratified; this restriction did not initially seem to be a problem in practical work 
with the prover, except for a technical problem detailed below. 

There is a technical difficulty with the implementation of first-order logic 
using stratified \-calculus which must be noted. A sentence like (Vz.(4y.x = y)), 
which is regarded as “stratified” in the context of a set theory like “New Founda- 
tions”, is represented in the language of Watson by a term forall © [forsome 
@ (71 = ?72]] which is not on the face of it stratified! If the whole term is assig- 
ned type 0, the subterm forall gets type 1 and the subterm [forsome © [71 
= ?2]] gets type 0; thus the subterm forsome @ [71 = 72] gets type —1, from 
which we see that forsome gets type 0 and [71=?2] gets type 0. We then see 
that 71 = 72, and so both 71 and ?2, get type —2. The rules of stratification 
require that the type of ?1 (—2) be the same as the type of the body forsome 
@ [71 = 72] of the abstraction term in which it is bound, and this is not the 
case: the term forsome @ [71 = 72] has type —1. 

This is a merely technical problem because one can show that the relative 
type of a term with a boolean value (like forsome @ [71 = ?2]) can be freely 
raised or lowered by any desired amount to recover stratification. The equations 
((P || [true] , [false])) @ 0 = Pand ([P] = [true]) = P hold when P 
is replaced by either true or false; these equations can be used to freely raise 
or lower the type of a term whose value is known to be boolean. Of course, no 
one wants to carry out manipulations of this kind explicitly in a theorem proving 
system! The solution of this problem was to enable the prover to recognize for 
itself subterms belonging to classes on which type raising or lowering is possible 
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and exploit this information to recognize a more general class of terms as stra- 
tified. With this generalization of stratification, the technical problem outlined 
above became entirely invisible to the user. 

Classes on which type-raising and lowering is possible (called “strongly can- 
torian sets”, abbreviated s.c.) are of considerable interest in set theories like NF. 
If it is assumed that the set of natural numbers is s.c., it follows that most sets of 
interest in mathematics and certainly all sets of interest in computer science ap- 
plications are s.c. The relaxation of stratification restrictions for natural number 
values and values belonging to common data types proves useful in practice; it 
has the side-effect of making the logic somewhat stronger than the simple theory 
of types with infinity. It is beyond the scope of this paper, but it is worth noting 
briefly that a theme of our research is the study of an analogy between the notion 
of “s.c. set” and the notion of “data type”, and that practical experience with 
Watson seems to indicate that this can be a useful analogy. 

The problems with the treatment using stratified A-calculus which caused 
us to introduce the class application operator were subtler, having to do with 
uniform treatment of quantifiers over variables of different relative types in for- 
mal rules for first-order logic. For example, the addition of the class machinery 
makes it possible to handle the logical principle (Vzy.¢) «+ (Vyz.¢) uniformly; 
if quantification were implemented using set abstraction, it would be necessary 
to take the difference in relative type between x and y into account in each such 
equivalence. The introduction of class application and abstraction increases the 
ability of the prover to apply limited forms of higher-order matching as well. 

The representation of mathematical constructions in this higher-order logic 
is very similar to the representation in the fragment of Church’s type theory de- 
scribed above. Since the latter system is not usually used, some remarks are in 
order. The lack of types like ((t + 4) > 4) and (« — (4  2)) in this fragment of 
simple type theory does not create significant problems with expressive power: 
both of these types are readily represented in type 2 = ((4 — 4) > (4 — 2)) by 
exploiting the coding of any type or subcollection of a type in the linear type 
scheme using a collection of constant functions in the next higher type. The 
same device works in general to handle types outside the linear type scheme. 
Mathematics as implemented in Watson tends to look very much like mathe- 
matics implemented in a typed A-calculus, except for this kind of occurrence of 
the constant function operator to adjust relative types. The declaration of “data 
types” as s.c. often allows such occurrences of the constant function operator to 
be omitted. 


9 Conclusions and Relations to Other Work 


The main purpose of this paper is to document the mathematical underpinnings 
of the Watson theorem prover. We have been more attentive to the features which 
are not documented elsewhere (the logic of case expressions and the new class 
abstraction machinery) than to the higher order logic embodied in the stratified 
A-calculus. We feel that this system has certain features of independent interest, 
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however. The use of the logic of case expressions as a foundation for propositional 
logic seems interesting to us; certainly the axiomatization is economical. It is less 
novel to identify the abstraction implicit in quantification with the abstraction 
which constructs functions, and in fact the latest version with class application as 
well as set application retreats from such an identification. The development of 
Watson has been an outgrowth of our interest in the application of the untyped 
set theories in the style of Quine and the related A-calculi, and we believe that 
untyped grand logics ought to be of interest in theorem proving in general. 

We are aware that there is other work using deBruijn indices and related 
schemes (including the one given here) which would be technically similar to our 
formal development of substitution, especially by researchers in the area of “ex- 
plicit substitution”. We can only make up for our lack of references by pleading 
ignorance of this work; our development is independent, though certainly not 
original. 

We do not believe that any theorem proving system is very close in its details 
to Watson, and in any event the details of the theorem prover are not relevant to 
this paper. The closest system in terms of its underlying mathematical framework 
is probably HOL ([3]), which implements Church’s classical simple type theory of 
functions. 
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Abstract. This paper describes a way of modeling inheritance (in ob- 
ject-oriented programming languages) in higher order logic. This par- 
ticular approach is used in the LOoP project for reasoning about JAVA 
classes, with the proof tools Pvs and ISABELLE. It relies on nested inter- 
face types to capture the superclasses, fields, methods, and constructors 
of classes, together with suitable casting functions incorporating the dif- 
ference between hiding of fields and overriding of methods. This leads to 
the proper handling of late binding, as illustrated in several verification 
examples. 


1 Introduction 


This paper reports on a particular aspect of the semantics of object-oriented 
languages, like JAVA, used in the “LOOP” verification project [23]. It concentrates 
on inheritance. A companion paper {3] explains the underlying memory model. 

Inheritance is a key feature of object-oriented languages. It allows a pro- 
grammer to model his/her application domain according to a natural “is-a” 
relationship between classes of objects. One can use inheritance, for instance, to 
say that a lorry is-a vehicle by making a class of lorries a subclass (or “child” 
or “descendant” ) of a superclass (or “ancestor” ) of vehicles. Important aspects 
of inheritance are re-use of code, and polymorphism. The latter is sometimes 
called subtype polymorphism (to distinguish it for example from parametric po- 
lymorphism). Its effect is that the particular implementation that is used in a 
method call is determined by the actual (run-time) type of the receiving object, 
which tells to which class the object belongs. This mechanism is often referred 
to as dynamic method look-up or late binding. It is precisely this dynamic as- 
pect of object-oriented languages which is difficult to capture in a static logical 
setting. Therefore, the semantics of inheritance—as a basis for reasoning about 
classes—is a real challenge, see e.g. [5,17,21,6,12,8,18]. There is a whole body of 
research on encodings of classes using recursive or existential types, in a suitably 
rich polymorphic type theory (like FY, or F<,). Four such (functional) enco- 
dings are formulated and compared in a common notational framework in [4]. 
But they all use quantification or recursion over type variables, which is not 
available in the higher order logic (the logics of Pvs and ISABELLE/HOL) that 
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will be used here. Quantification and/or recursion over type variables is available 
in LEGO and CoQ, but since much proving power is required for verifying non- 
trivial properties about actual JAVA programs, we prefer to use tools like pvs 
and ISABELLEbecause they do not have the overhead of explicit proof-objects. 
The setting of the encoding in [18] is higher order logic with “extensible re- 
cords”. This framework is closest to what we use (but is still stronger). Also, an 
experimental functional object-oriented language, without references and object 
identity is studied there. This greatly simplifies matters, because the subtle late 
binding issues involving run-time types of objects (which may change through 
assignments, see Section 7) do not occur. Indeed, it is a crucial aspect of impera- 
tive object-oriented programming languages that the declared type of a variable 
may be different from—but must be a supertype of—the actual, run-time type 
of an object to which it refers. This property is called type-safety. Our semantics 
of inheritance works for an existing object-oriented language, namely JAVA, with 
all such semantical complications. 

The explanations below only describe a small part of the denotational seman- 
tics of JAVA used in the LOOP project. Due to space limitations, many aspects 
have to be left unexplained. We intend to concentrate on the main ideas under- 
lying the handling of inheritance, using paradigmatic examples. Many related 
issues, like inheritance from multiple interfaces (which mainly involves proper 
name handling), method overloading, or object creation via constructor chaining, 
are not covered in the present paper. 

For our JAVA verification work a special purpose compiler, called Loop, for 
Logic of Object-Oriented Programming, has been developed. It works as a front- 
end to a proof tool, for which both pvs [19] and ISABELLE [20] can be used, as 
suggested by the following diagram. 


PVS QED 
—___—__>e 
proof tool 
PVS 
heories | 
JAVA LOOP 
o—_—_> ? user statements 
classes | compiler 


ABELLE 
theories 


The LooP tool translates JAVA classes into logical theories, containing definitions 
(embodying the semantics of the classes) plus special lemmas that are used 
for automatic rewriting. The tool works on classes which are accepted by a 
standard JAVA compiler. It handles almost all of sequential JAVA. The generated 
logical theories can be loaded into the back-end proof tool, together with the 
so-called semantical prelude, which contains basic definitions, like in Section 3 
below. Subsequently, the user can state desired properties about the original JAVA 


ISABELLE/HOL 
proof tool 
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classes and prove these on the basis of the semantical prelude and the generated 
theories. For example, a user may want to prove that a method terminates, 
returning a certain value; see Section 8 for several examples. 

The semantics that is used is based on so-called coalgebras (see [14,13]). In 
general, a coalgebra is a function with type X — F(X), where F describes the 
interface, or access points, and X the state space, or set of states. Coalgebras 
give rise to a general theory of behaviour for dynamical systems, involving useful 
notions like invariance and bisimilarity, but we shall not make use of it here. In 
this paper, coalgebras are only used to conveniently combine all the ingredients of 
a class, i.e. the fields, methods and constructors, in a single function. Specifically, 
n functions f;: Self - 01, ..., fy: Self  o, with a common domain can be 
combined in one function Self + [fi: 01,...,fn: On|] with a labeled product 
type as codomain?, forming a coalgebra. Thus, the combined representations 
of fields, methods and constructors of a class form a coalgebra that is used 
as implementation of the class. The use of coalgebras in this paper is mostly 
organisational and remains fairly superficial; it is not essential for what happens?. 

The paper is organised as follows. It starts with two preliminary sections: 
one on the type-theoretic notation that will be used, and one about some basic 
aspects of JAVA semantics. Then, Section 4 introduces interfaces types as labeled 
products to capture the ingredients of classes, and shows how these are nested to 
incorporate superclasses. Section 5 discusses hiding and overriding at the level 
of these interface types, via special cast functions, and Sections 6 and 7 show 
how these functions realise the appropriate late binding behaviour. Finally, Sec- 
tion 8 describes two example verifications, one in PVs and one in ISABELLE/HOL, 
involving small but non-trivial JAVA programs. 


2 Higher Order Logic 


The actual verifications of JAVA programs in the LOOP project are done using eit- 
her? Pvs or ISABELLE/HOL, see Section 8. In this paper we shall abstract away 
from the specific syntax for the higher order logic of PVS or ISABELLE/HOL, 
and use a (hopefully more generally accessible) type-theoretic language. It in- 
volves types which are built up from: type variables a, G,..., type constants 
nat, bool, string (and some more), exponent types o —> 7, labeled product (or 
record) types [lab;: 01,... ,laby: o, | and labeled coproduct (or variant) types 
{lab}: 0, | ... | labh: om }, for given types 0,7,01,...,0n. New types or type 
constructors can be introduced via definitions, as in: 


~ TYPE THEORY 
lift{a] : TYPE ° {bot: unit | up: a} 


! Alternatively, one can combine these n functions into elements of a “trait type” 
[ fi: Self > o1,..., fn: Self > on), like in [1, §§8.5.2}. 

? As a side-remark, all the encodings discussed in [4] implicitly also use coalgebras. 

3 Translating to both pvs and ISABELLE/HOL offers the verifier a choice which proof 
tool to use. 
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where unit is the empty labeled product type []. This lift type constructor adds 
a bottom element to an arbitrary type, given as type variable a. 

For exponent types we shall use the standard lambda abstraction Ax: 0. M 
and application N-L notation. For terms M;: 0;, we have a labeled tuple (lab; = 
M,,... , lab, = M,,) inhabiting the labeled product type [lab]: 01,... ,labn : op]. 
For a term N: [laby: 01,... ,laby: on] in this product, we write N.lab; for the 
selection term of type o;. Dually, for a term M: o; there is a labeled or tagged 
term lab; M in the labeled coproduct type {lab;: 01 | ... | labh: on }. And for 
a term N: {lab): 01 | ... | lab,: op, } in this coproduct type, together with n 
terms L,(xz;): 7, possibly containing a free variable z;: a;, there is a case term 
CASES N OF {lab} 2 ++ Li(a1) |... | labh ayn +> Ly(zn) } of type 7. These in- 
troduction and elimination constructions for exponents and labeled (co)products 
are required to satisfy standard (f)- and (7)-conversions. 

In this paper we do not use any formulas in higher order logic—which are of 
course terms of type bool—and work exclusively in the underlying type theory. 
This is possible because we describe only a limited part of the semantics of JAVA. 


3 Semantics of Java Statements and Expressions 


In this paper we shall use Self as a type variable representing a global state space. 
Later, Self will be instantiated with the type OM, describing a concrete state 
space. But as long the details from OM are not needed, the type variable Self 
shall be used, for abstraction. JAVA statements and expressions will be modeled 
as state transformer functions (or coalgebras) acting on Self. Statements and 
expressions in JAVA may either hang, terminate normally, or terminate abruptly. 
These different output options are captured by two output types StatResult[Self], 
and ExprResult|Self, a], in: 


Self ——> StatResult[Self] Self ——> ExprResult[Self, a] 


where a is a type variable for the result type of the JAVA expression. These 
output types are defined as labeled coproducts: 


- TYPE THEORY 


StatResult[Self] : TYPE ee ExprResult[Self, a] : TYPE a 
{ hang: unit { hang: unit 
| norm: Self | norm: [ns: Self, res: a | 
| abnorm : StatAbn|Self] } | abnorm: ExprAbn|Self] } 


The types StatAbn[Self] and ExprAbn[Self] capture the various abnormalities that 
can occur for JAVA statements and expressions (like exceptions, returns, breaks 
and continues). Their precise structure is not relevant for this paper—but can 
be found in [103.9]. 
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On the basis of this representation, the denotational semantics of all of JAVA’s 
language constructs, like while, catch efc., can be defined, closely following 
the JAVA language specification [7]. For instance, the composition s;t of two 
statements s,t: Self —+ StatResult[Self] is defined as: 


- TYPE THEORY 


s;t: Self — StatResult[Self| ae 


= Aa: Self. CASES s-a OF { 
| hang +> hang 
| normy re t-y 
| abnorm a ++ abnorm a } 


We do not describe all these details here, as they are not necessary to under- 
stand inheritance. What we do need in the sequel is a special type RefType for 
references. It is defined as either a null-reference null or a non-null-reference 
refz, where z consists of a pointer to a memory location, where the object that 
is being referred to resides (see [3] for details), a string, indicating the run-time 
type of the object, and a third field that is used if the reference points to an 
array to give its dimension and length: 


-TYPE THEORY 


RefType : TYPE = 
{ null: unit | ref: [ objpos: MemLoc, 
clname: string, 
dimlen: lift[ [dim: nat, len: nat]]]} 


Recall that in object-oriented languages one must keep track of the run-time type 
of an object, because it may differ from its declared type. All references in JAVA 
(both to objects and to arrays) are translated in type theory to values of type 
RefType. Thus, if we have an object a in a class A and an object b in a subclass 
B of A, then the translation of an assignment a = b involves a replacement of 
the reference to a by the reference to b. Since both are inhabitants of RefType, 
this is well-typed. If b has run-time type B, then so will a after the assignment. 


4 Nested Labeled Product Types for Interfaces 


JAVA has classes and interfaces. Interfaces only contain the headers of methods 
(their names and the types of their parameters and results, if any), but not their 
implementations (given by what are usually called method bodies). The latter 
can only occur in classes. In this section we do not make a distinction between 
classes and interfaces, because method bodies do not play a réle. Therefore, class 
can also mean interface at this stage. What we describe is how certain labeled 
product types (in type theory) are extracted from JAVA classes. These product 
types describe the superclasses, fields (or instance variables) with associated 
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assignment operations, methods, and constructors of a class. They form the basis 
for the type-theoretic formalisation of JAVA classes. Below, we first describe how 
an appropriate labeled product type is extracted for a single isolated class, and 
then how inheritance is handled—involving several related classes. 

It is easiest to proceed via an example of a JAVA class: 


~ JAVA 
class MyClass { 
int i; 
int k = 3; 


void m (byte a, int b) { if (a> b) i = a; else i = b; } 
MyClass () { i = 6; } 
} 


Ignoring the implicit superclass Object, the following interface type is extracted. 


-—TYPE THEORY 


MyClasslFace[Self] : TYPE es 


[ses // for the superclass, see below 
i: int, 
i-becomes: int > Self, 
k: int, 
k_becomes: int — Self, 
m: byte —> int > StatResult|Self], 
MyClass: ExprResult[Self, Ref Type] | 


There are several things worth noticing here. 


— The field declaration int i gives rise not only to a label i: int for field 
lookup in the product type but also to an associated assignment operation, 
with label i-becomes. This assignment operation takes an integer as input, 
and produces a new state in Self, in which the i field is changed to the 
argument of the assignment operation (and the rest is unchanged). Similarly 
for k. Variable initialisers (like k = 3) are ignored at this stage, since they 
are irrelevant for the interface type (just as method bodies). 

— The method nm, which is a void method, is modeled as an entry m in the 
labeled product of type StatResult[Self]*. Similarly, methods with a return 
value are modeled with ExprResult, e.g. int n () {return 3;} would give 
rise to a label n with type ExprResult[Self, int]. 


4 To prevent name clashes, the LOOP compiler does more. For example, overloading of 
labels usually is not allowed in labeled product types. Therefore the LOOP compiler 
does not use m but m_byte_int as translation of m in JAVA. Here we shall ignore such 
bureaucratic aspects, and simply assume that no such name clashes occur. 
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— The type of the constructor MyClass is implicit in the JAVA code, but is 
made explicit in the type-theoretic formalisation. Since a constructor returns 
a reference to a newly created object, it is modeled as an entry with type 
ExprResult|Self, Ref Type]. Constructors are often left implicit in JAVA code, as 
so-called default constructors. These are added explicitly to interface types. 


Thus, a labeled product term in the interface type MyClassiFace contains 
all the operations (i.e. field access, field assignment, method calls and object 
construction) that can be applied to instances of MyClass. 

The types occurring in the interface type MyClasslFace above describe the 
“visible” signatures of the fields, methods and constructors in the JAVA class 
MyClass. But in object-oriented programs there is always an implicit argument to 
a field/method/constructor, namely the current state of the object on which the 
field/method/constructor is invoked. This is made explicit by modeling classes 
as coalgebras for interface types, i.e. as functions of the form: 


Self MyClass! Face|Self 

Such a coalgebra actually combines the fields, methods and constructors of the 
class in a single function. The individual operations are made explicit, using the 
isomorphism Self > [ f1: 01,-.-,fn: On| & [fi: Self 4 01,..., fn: Self > on], 
via what we call “extraction” functions: 


- TYPE THEORY 


Assuming a variable c: Self + MyClasslFace[Self], 


F i(c): Self — int 2 Na: Self. (c-2).i 


F i_-becomes(c): Self — int — Self dF ya: Self. (c- z).i-becomes 


F k(c): Self > int 2 Nz: Self, (c:a).k 


+ k_becomes(c): Self — int — Self def do: Self. (c- x).k_becomes 
def 


b: byte, 7: int F m(b)(7)(c): Self > StatResult|Self] = Asx: Self. ((c-a).m)-b- 7 
+ MyClass(c): Self + ExprResult/Self, RefType] af Nar: Self. (c- a).MyClass 


Note that we use a form of overloading: the i in (c-«).i is a label from a product, 
whereas the i in i{c) is the extraction function that is being defined. Note also 
that for fields like i, besides the look-up function i, an additional update func- 
tion i-becomes is defined. The coalgebra c: Self + MyClass|Face[Self] above thus 
combines all the operations of the class MyClass. In the remainder of this paper, 
we shall always describe operations—fields (with their assignments), methods, 
constructors——of a class, say A, using extraction functions as above, with respect 
to a coalgebra of type AlFace. 
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4.1 Inheritance and Nested Interface Types 


Now that we have seen the basic idea of how to build an interface type from 
the fields, methods and constructors of a JAVA class, we proceed to incorporate 
superclasses. This will be done via nesting of interface types. Again, it is easiest 
to use an example. 


— JAVA 
class MySubClass extends MyClass { 
int j; 
int n (byte b) { m(b, 3); return i; } 
} 


This new class MySubClass inherits the field i and method m of MyClass, 
and it declares its own field j and method n. As can be seen in the body of the 
method n, the methods and fields from the super class are immediately available, 
i.e. the method m and field i are called without any further reference to MyClass. 
This should also be possible in our formalisation. 

This class gives rise to the following interface type MySubClasslFace in type 
theory. In this labeled product type the first entry MyClasslFace is the inter- 
face type defined earlier for the class MyClass, thus formalising the inheri- 
tance relationship. In a similar way, the type MyClasslFace contains an entry 
super_Object: ObjectlFace[Self], formalising the implicit inheritance from Object 
by MyClass. 


+ TYPE THEORY 


MySubClasslFace[Self] : TYPE 


{super_MyClass: MyClasstFace[Self], 

jc int, 

j-becomes: int — Self, 

n: byte — ExprResult|Self, int}, 
MySubClass: ExprResult[Self, RefType] | 


As before, we shall consider a coalgebra c: Self > MySubClass!Face|Self] as 
representation of the class MySubClass. For such a coalgebra we can again define 
extraction functions, giving us access to all ingredients of MySubClass, but also 
of MyClass, via the nesting of interfaces. This goes as follows. 


- TYPE THEORY 


Assuming a variable c: Self —> MySubClass!Face|Self], 


F j(c): Self — int af yz: Self. {c-a).j 
F j_-becomes(c): Self + int > Self a ya: Self. (c- x).j becomes 


b: byte  n(d)(c): Self + StatResult[Self] 9 da: Self. ((c-z).n)-b 
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- MySubClass(c): Self + ExprResult[Self, Ref Type] ae 
Az: Self. (c - 2).MySubClass 
// continue with the superclass MyClass 
F i(c): Self > int def ya: Self. (c - x).super_MyClass.i 


F i_becomes(c): Self + int —> Self df ya: Self. (c- x).super_MyClass.i_-becomes 


b: byte, 7: int - m(b)(j)(c): Self > StatResult[Self] asf 


Az: Self. ((c- z).super-MyClass.m) - b+ 7 
// ete. 


The repeated extraction functions, like i, ibecomes and m, thus give imme- 
diate access to all ingredients of superclasses. Note how this involves overloa- 
ding in type theory, because for instance i(c) is defined both for coalgebras 
c: Self -+ MyClasslFace[Self] and for coalgebras c: Self +» MySubClass!Face[Self] 
representing the classes MyClass and MySubClass. 


5 Overriding and Hiding 


In the previous section we have seen an example of inheritance where the subclass 
MySubClass simply adds an extra field and method to the superclass. But the 
same fields and methods may also be repeated in subclasses. In JAVA this is called 
hiding of fields, and overriding of methods. Different names are used, because the 
mechanisms are different: field selection is based on the static type of receiving 
objects, whereas method selection is based on the dynamic (or run-time) type of 
an object. The latter mechanism is often referred to as dynamic method lookup, 
or late binding. Consider the following example. 


— JAVA 


class A { 
int i = 1; 
int m() { return i * 100; } 


class B extends A { 
int i = 10; 
int m(Q) { return i * 1000; } 
} 
class Test { 
int testiQ) { A[] ar = { new AQ, new BC) }; 
return ar(0].i+ ar[O].mQ) + ar{1j].i + ar[1J.mQ; } 


310 M. Huisman and B. Jacobs 


The field i in the subclass B hides the field i in the superclass A, and the 
method m in B overrides the method m in A. In the test1 method of class Test 
a local variable ar of type ‘array of As’ is declared and initialised with length 2 
containing a new A object at position 0, and a new B object at position 1. 
Note that at position 1 there is an implicit conversion from B to A to make the 
new B object fit into the array of As. Interestingly, the test1 method will return 
ar(oO].i + ar(O].mQ + ar[i].i + ar[1].mQ, whichisi + 1 * 100 + 1 + 
10 * 1000 = 10102, because: when new B() is converted to type A the hidden 
field becomes visible again—so that the field ar[1] .i is statically bound to i in 
A—but the overriding method replaces the original method—so that the method 
ar[1].m() leads to execution of m in B (which uses the field i from B). See [2, 
§§3.4], or also [7, §§8.4.6.1]: 


Note that a qualified name or a cast to a superclass is not effective in 
attempting to access an overridden method; in this respect, overriding 
of methods differs from hiding of fields. 


This difference in binding for redefined fields and methods is typical for JAVA. 
For example in EIFFEL [16] it is not allowed to redefine fields in subclasses. 

It is a challenge to provide a semantics for this behaviour. We do so by 
using a special cast function between coalgebras, which performs appropriate 
replacements of methods and fields. We shall illustrate this in the above JAVA 
example. The interface types for classes A and B are defined as follows. 


—- TYPE THEORY 


AlFace(Self] : TYPE %* BlFace(Self] : TYPE 2 
[super_Object: ObjectlFace[Seif], [super_A: AlFace[Self], 
i: int, i: int, 
i-becomes: int —> Self, i-becomes: int — Self, 
m: ExprResult[Self, int], m: ExprResult|Self, int], 
A: ExprResult[Self, RefType] | B: ExprResult[Self, RefType] | 


Notice that the interface type BlFace[Self] contains m and i twice: once di- 
rectly, and once inside the nested interface type AlFace[Self]. Thus we define two 
extraction functions to access the individual operations for each of them: 


— TYPE THEORY 


Assuming a variable c: Self — BlFace[Self], 
+ i(c): Self int © Aa: Self. (c-a).i 
F A.i(c): Self > int 2 Nx: Self. (c- a).super_A.i 


t+ m(c): Self + ExprResult|Self, int] 4° Na: Self. (c-x).m 


+ A_m(c): Self + ExprResult[Self, int] a2 \ax: Self. (c- 2).superA.m 
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The extraction functions Ai and A_m are used for super invocations. 

What we want is a way of “casting” a B coalgebra c: Self —> BlFace[Self] to 
an A coalgebra B2A(c): Self — AlFace[Self] which incorporates the differences 
between hiding and overriding. Just taking the super-A entry is not good enough 
because then we get fields and methods from the superclass: we need additional 
updates, which select the fields of the superclass A, but the methods of the sub- 
class B. Therefore, we use a record update on the entry super_A, which updates 
the method entries to the methods of B, defining: 


~TYPE THEORY 


c: Self — BlFace[Self] + 
def 


B2A(c) : Self + AlFace[Self] = 
Az: Self. (c-x).super_A WITH (m := m(c) - x) 


As a result, m(B2A(c)) = m(c), and i(B2A(c)) = i(super_A(c)). 
In general, all overriding methods from a subclass replace the methods from 
its superclass. Hidden fields reappear after casting because they are not replaced. 
These cast operations are always defined for all cases of inheritance between 
the JAVA classes that are considered. Notice that the cast operations work “tran- 
sitively”, in the sense that given class C extending class B, and class B extending 
class A, the generated functions C2B, B2A and C2A are such that: 


B2A(C2B(c)) = C2A(c). 


6 Handling Late Binding 


The example in the previous section involves late binding: ar [1] has static type 
A, but invoking ar[1J].m() results in the execution of m from B, because ar [1] 
has run-time type B. We shall study this mechanism in more detail. First, in 
this section we concentrate on late binding within the current object (on this if 
you like), and later, in the next section, we concentrate on method invocations 
on different objects. 

Suppose for now that the class A from the previous section also contains a 
method n which simply calls m, and is used in B, as in: 


— JAVA 


class A { 
... // as before 
int n© { return i+ m(); } 


- 
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class B extends A { 

... // as before 

int test2() { return n(); } 
} 


Again due to late binding, test2 returns the value of the field i from A plus the 
result from the method m in B, since, as explained earlier, field selection is based 
on the static type and method selection is based on dynamic types. Since the 
run-time type of the object in which test2 is executed is B, late binding ensures 
the execution of m from B. Thus, test2 returns the value of i from A + 1000 x 
the value of i from B. 

This behaviour is realised in our semantics by using the method bodies of A, 
in particular the body of n, with appropriate casts from a B coalgebra to an A 
coalgebra. Before we can see how this works, we need to know a bit more about 
method bodies. 


6.1 Formalisation of Method Bodies 


Space restrictions prevent us from explaining the details about the translation 
of JAVA method bodies into type theory, as performed by the Loop tool. The- 
refore we concentrate on what is relevant here, necessarily leaving many things 
unexplained. More details may be found in [15,10,3,9]. So far we have used a 
type variable Self for the state space. In the actual translation, a fixed type 
OM is used. It represents the underlying memory model, see [3]. It consists of 
three parts: a heap, a stack, and a static part, each with an infinite series of 
memory cells and a ‘top’ position indicating the next unused cell. Several ‘put’ 
and ‘get’ operations are defined for writing and reading from this memory, at 
various locations. 

The type-theoretic translation of the body of the method n from A looks as 
follows. 


~—TYPE THEORY 


c: OM > AlFace[OM] + 
nbody(c) : OM -> ExprResult[OM, int] def 
Ag: Self. LET ret_n: OM — int = get_int(stack(stacktop(zx), 0)), 
ret_n_-becomes: OM -> int + OM = 
put_int(stack(stacktop(z),0)) 

IN (CATCH-EXPR-RETURN(stacktop-inc ; 
E2S(A2E(ret_n_becomes(F2E(i(c)) + m(c)))); 
RETURN)(ret_n) @@ stacktop_dec) (x) 


The reader is not expected to understand all details about the translation of this 
method body, as that is not really needed at this stage. We briefly explain the 
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basics: first, a special local variable ret_n is declared, together with an associated 
assignment operation, and is bound to a particular position on the stack. This 
variable ret_n is used to temporarily hold the result of the computation. At the 
end it is read by the CATCH-EXPR-RETURN function. But first, the stacktop is 
incremented, so that later method calls do not interfere with the values in the 
cell used for this method (where the value for ret_n is stored). The actual body 
return i + m() gets translated into an assignment of F2E(i(c)) + m(c) to the 
return variable via ret_n_becomes, followed by the return statement RETURN— 
where F2E is an auxiliary function used to turn a field access function into 
an expression, and similarly E2S and A2E produce expression with appropriate 
types. At the very end, the stacktop is decremented again, freeing the used cell 
at the stack. Hopefully, this explanation does convey the main idea of what is 


going on, namely: 
nbody(c) =| --- i(c) +m(e) --- 


The important thing to note is that the definition of nbody is parameterised® 
by an A coalgebra c: OM — AlFace{OM]. In the translation of A, the call n(c) 
rewrites to nbody(c). The whole trick in getting late binding to work correctly 
is to have the (repeated) extraction function n(d) for a B coalgebra d: OM > 
BiFace[OM] rewrite to the method body nbody(B2A(d)), which is the body as in 
A, but with a casted coalgebra. The effect is summarised in the following table. 


Class | binding | m in = i in nbody 


A with coalgebra 


c: OM - AlFace[OM] ite) 
B with coalgebra n(d) to m(B2A(d)) i(B2A(d)) = 
d: OM + BIFace[(OM]|} nbody(B2A(d))}_ = m(d) i(super_A(d)) 


This is precisely what we want, namely that the method call to m in n in B, 
i.e. m in nbody(B2A(d)), is m from B, whereas i in nbody(B2A(d)) is i from 
A. The coalgebra by which nbody is parametrised thus formalises the method 
lookup table of the current object. 

In conclusion, late binding is realised by binding in subclasses the repea- 
ted extraction functions of methods from superclasses to the bodies from the 
superclasses, but with casted coalgebras. 


7 Method Calls to Other Objects 


In this section we consider method calls of the form o.m(), where o is a “recei- 
ving” or “component” object. Field access o.i is not discussed explicitly, but is 
handled similarly. Examples occurred in Section 5, where o was an array access 
ar [0] or ar(1]. 


® In reality it is even more complicated, since the method body has more parameters 
than are mentioned here. 
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So far we have been using coalgebras with type OM > AlFace[OM] to capture 
the ingredients of a class A. These coalgebras actually have two more parameters, 
namely a memory position, of type MemLoc, and a string. The memory position 
points to the location in memory where the values of the fields of the object 
are stored. The string can be the name of the class that the coalgebra itself 
represents (like “A”), or the name of one of its subclasses, representing the run- 
time type of an object. Thus, we use parametrised coalgebras of type string > 
MemLoc > OM — AlFace[OM]. 


For each class, say A, a specific coalgebra A_clg: string — MemLoc + OM > 
AlFace[OM| is assumed, with requirement: 


A-clg(“A”)(p) implements A. 


This implementation requirement expresses that fields, methods and construc- 
tors in A_clg(“A”) behave as described, for example, in their method bodies. If 
A has a subclass B, then additional requirements are imposed, namely, 


A-clg(“B”)(p) = B2A(B_clg(“B”)(p)) and B_clg(“B”) implements B 


(And similarly, for further subclasses.) The first of these requirements expresses 
that the implementation of A on an object with run-time type B behaves like the 
implementation of B, casted to A. 


Why is this relevant? Consider a JAVA method invocation expression o.m(), 
where the receiving object o has static type A, say, and m is non-void (i.e. has 
a return type). This expression is translated via an auxiliary function CE2E®, 
namely as [o.mQ] = CE2E(A-clg)(fo]])(m). This function CE2E first evaluates 
jo]; if [o] terminates normally, this produces a value in RefType, see the end of 
Section 3. In case this result is a null-reference, a NullPointerException will be 
thrown; if it is a non-null-reference, it contains a memory location p and a string s 
(describing o’s run-time type). The method m(A-clg-s-p) will then be evaluated, 
corresponding to execution of m by the receiving object o—stored at location p 
in OM—with the run-time type of o determining the implementation of m that is 
chosen. Using the implementation requirements on coalgebras from the previous 
paragraph and the coalgebra-cast functions, the appropriate body for m is found. 
All this is in accordance with the explanation of method invocation in the JAVA 
language specification [7, §§15.11]. 


The function CE2E is defined as follows. 


® CE2E stands for “Component-Expression-to-Expression”. 
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-— TYPE THEORY 


c: string +> MemLoc > OM -— IFace, 
o: OM ->+ ExprResult[OM, RefType}, 
m: (OM — IFace) + OM — ExprResult[OM, a] + 
CE2E(c)(0)(m) : OM — ExprResult[OM, a] ee 
Az: OM. CASES o- a OF { 
| hang ++ hang 
| norm y +> 
CASES y.res OF { 
| null “NullPointerException” 
| refr ++ m(c: (r.clname) - (r.objpos)) - (y.ns) } 
| abnorm a ++ abnorm a } 


Notice that such a method invocation hangs, or terminates abruptly if the re- 
ceiving object o does, and also that the possible side-effect of evaluating o is 
passed on to the method m, via the state y.ns. The details of how exceptions are 
thrown are not relevant here, and are omitted. 

The main point is: if we have an object a in a class A and an object b in 
a subclass B of A, both with a method nm, then after an assignment a = b the 
run-time type of a (given by the clname label) is equal to the run-time type of 
b, and so a method invocation a.m() will have the same effect as b.m(), since 


m(A.clg(“B”)(p)) = m(B2A(B-clg(“B”)(p))) = m(B-clg(“B”)(p)). 


where p is the memory location of a (and b). But note that a field access ex- 
pression a.i may yield a different result from b.i! 


8 Example Verifications 


Next it will be described how the semantics, as sketched in the previous sections, 
is used for tool-supported reasoning about JAVA classes. Actually, no explicit rea- 
soning principles are needed for handling inheritance, because automatic rewrit- 
ing takes care of proper method selection. Therefore, inheritance requires no 
special attention in verification—but remains difficult in specification. This is of 
course very convenient, and a good reason for using this particular semantics. 

We shall describe two example verifications, based on translations of JAVA 
programs by the Loop tool. The first verification is in pvs, and the second one in 
ISABELLE/HOL. Here we shall no longer use the type-theoretic syntax of earlier 
sections, but use PVS and ISABELLE syntax. The first verification is about the 
JAVA classes in Section 5, and establishes the properties mentioned there. The 
PVS statements that have been proved are: 
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-PVS 


IMPORTING ... % code generated by the LOOP tool is loaded 


testi : LEMMA p < heap?top(x) IMPLIES 
norm?? (testi? (Test?clg("Test") (p)) (x)) 
AND 
res? (test1?(Test?clg("Test") (p)) (x)) = 10102 


test2 : LEMMA p < heap?top(x) IMPLIES 
norm?? (test2? (B?clg("B") (p)) (x)) 
AND 
res? (test2? (B?clg("B") (p))(x)) = 
i(B?27A(B?clg("B") (p))) (x) + i(B?clg("B") (p)) (x) * 1000 


The first lemma test1 states that evaluation of test1 terminates normally, 
returning 10102. The second lemma states that evaluation of test2 also ter- 
minates normally, and the return value equals the value of i from A, plus 1000 
times the value of i from B. 

The Pvs code contains lots of question marks ‘?’, which are there only to 
prevent possible name clashes with JAVA identifiers (which cannot contain ‘?’). 
Both lemmas have a technical assumption p < heap?top(x) requiring that the 
position p of the receiving object is in the allocated part of the heap memory. 
The proofs of both these lemmas proceed entirely by automatic rewriting’, and 
the user only has to tell pvs to load appropriate rewrite rules, and to start 
reducing. The functions CE2E and B2A play a crucial réle in this verification. 
Hopefully the reader appreciates the semantic intricacies involved in the proof 
of the first lemma: array creation and access, local variables, object creation, 
implicit casting, and late binding. 

The second verification deals with the following JAVA program. 


—- JAVA 
class C { 
void m() throws Exception { m(); } 
} 


class D extends C { 
void m() throws Exception { throw new Exception(); } 
void test() throws Exception { super.m(); } 

: 


” To give an impression, the proof of test1 involves 790 rewrite steps, taking about 
67 sec., on a 450 Mhz. Pentium III with 128 MB RAM under Linux. 
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At a first glance, one might think that evaluation of the method test will not 
terminate, but on the contrary, it throws an exception. In the body of test the 
method m of C is called. This method calls m again, but—due to late binding— 
this results in execution of m in D. However, if m is called on an instance of class C 
directly, this will not terminate. The ISABELLE/HOL statements that have been 
proved are the following. 


—ISABELLE 
(* Code generated by the LOOP tool is loaded ¥*) 
Goal "p < heap_top x ==> \ 


\ case DInterface.test_ (D_clg ’’D’’ p) x of \ 
\ Hang => False\ 
\ [Norm y => False\ 
\ [Abnorm a => True"; 


(* Simplifier *) 
qed "m_in_D_Abnorm"; 


Goal "p < heap_top x ==> \ 


\ case CInterface.m_ (C_clg ’’C’’ p) x of \ 
\ Hang => True\ 

\ [Norm y => False\ 

\ {Abnorm a => False"; 

(* Proof *) 


qed "m_in_C_hangs"; 


These lemmas state that evaluation of m on an object with run-time type D will 
terminate abnormally, while evaluation of m on an object with run-time type C 
will not terminate, i.e. will hang. 

In the ISABELLE code the full name (including the theory name) is used for the 
extraction functions. This is to prevent name clashes, due to overloading. Again, 
the technical assumption p < heap_top x is used. The proof of the first lemma 
proceeds entirely by automatic rewriting®, after the user has added appropriate 
rewrite rules to the simplifier. The crucial point in this verification is the binding 
of the extraction function for super.m on a D coalgebra d: OM -> DIFace[OM] 
to the method body C_mbody(D2C(d)). 

The verification of the second lemma requires some more care, since it can not 
be done via automatic rewriting (as this would loop). To prove non-termination, 
several unfoldings and an appropriate induction are necessary. 


9 Conclusions 


We have described the main ideas of the inheritance semantics in the LOOP 
project for reasoning about JAVA classes, and shown the practical usability of 


® On a Pentium II 266 Mhz with 96 MB RAM, running Linux, this takes about 71 
sec, involving 5070 rewrite steps—including rewriting of conditions. 
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this semantics in two example verifications, where late binding was handled by 
automatic rewriting, both in Pvs and in ISABELLE/HOL. 

For more complicated examples, a Hoare logic can be used for reasoning [10]. 
The largest case study that have been done so far, is the verification of a non- 
trivial class invariant for JAVA’s Vector class [11]. This verification gives an 
impression of the size of JAVA programs that can be handled. Currently, this 
approach is also applied to the JavaCard API, see also [22]. 
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Abstract. We introduce a coinductively-defined refinement relation on 
sequential non-deterministic reactive systems that guarantees total cor- 
rectness. It allows the more refined system to both have less non-deter- 
minism in its outputs and to accept more inputs than the less refined 
system. Data reification in VDM is a special case of this refinement. 
Systems are considered at what we have called fine and medium levels 
of granularity. At the fine-grain level, a system’s internal computational 
steps are described. The fine-grain level abstracts to a medium-grain 
level where only input/output and termination behaviour is described. 
The refinement relation applies to medium grain systems. 

The main technical result of the paper is the proof that refinement is re- 
spected by contexts constructed from fine grain systems. In other words, 
we show that refinement is a precongruence. 

The development has been mechanized in PVS to support its use in case 
studies. 


1 Introduction 


Refinement. Refinement is a fundamental verification methodology and has a 
strong conceptual appeal. It takes a black-box view of systems, characterizing 
them by their observable interface behaviour. 

Let A be an abstract system, C' a concrete system, and assume that it has 
been shown that A refines to C, written A C C. A good definition of E and 
theory of refinement then provide a guarantee that we can substitute C for A in 
any environment with no observable consequences. 

Formally one way to give evidence for substitutivity is to show a precongru- 
ence property: 

FACC=> €A] CEC] 


of € for a class of contexts or environments €. 

This paper introduces a new definition of a refines-to relation for sequen- 
tial non-deterministic systems that addresses weaknesses of previously proposed 
relations. The main technical result is to prove this refines-to relation to be a 
precongruence for a general class of environments. 
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Inclusion-Based Refinement. Many common definitions of a refinement relation 
involve inclusion. For example refinement might assert that a step transition 
relation of the concrete system is included in that of the abstract system, or 
that every trace of the concrete system is also a trace of the abstract system, a 
trace being a sequence of observable states or input/output values. Such defini- 
tions have several problems, as we explain in the next two subsections. We use 
trace inclusion as an example, but our remarks apply to other inclusion-based 
definitions too. 


Contravariance of Inputs. A consequence of a trace-inclusion definition is that, 
if there is some step of behaviour in the concrete trace corresponding to the 
environment passing the system some input, there must be a similar step in the 
abstract trace. This is intuitively the wrong way round and allows a bad concrete 
system to inadvertently constrain environment behaviour and falsely appear to 
be correct. A variety of approaches have tried to deal with this. For example, 
the notion of receptivity is introduced [5]. 


Total Correctness. We consider it important that a refinement relation capture 
total correctness. Without totality, it is much harder to argue that a concrete 
system could replace an abstract system with no observable consequences. 

Trace inclusion is a partial correctness rather than total correctness notion. 
It requires that when the concrete system makes some step of behaviour passing 
output to the environment, the abstract system must make some matching step. 
However it doesn’t ever require that the concrete system make any output step 
in the first place. 

As explained in [4], one adaptation for total correctness is to introduce an 
extra value | into the state spaces of systems. If a system originally is not 
guaranteed to make an output step from some given state, a transition to L 
is added, along with a transition to every other state. There is therefore no 
possibility for a system to be blocked from making a step, all systems are total. 
Furthermore, on starting from a | state, a system must non-deterministically be 
able to transition to every possible state (including | again). With this setup, 
trace inclusion requires that, whenever the abstract system is capable of making 
only controlled steps (i.e. not to a | state), the concrete system also can only 
make controlled steps, and so inclusion now enforces total correctness. 


The Proposed Refines-to Relation. We propose a refines-to relation that directly 
captures the desirable contravariant relationship between inputs of abstract and 
concrete systems, and that ensures total correctness without the complications 
of adding extra | states. We define refines-to coinductively. See Sec. 4 for details. 


Fine and Medium Grain Systems. We focus our attention on non-deterministic 
sequential systems that alternately accept an input value from some environment 
and return an output value back to the environment. We assume systems can 
modify some internal state and that this state is preserved between returning an 
output value and accepting some next input. 
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We use an automata-based fine-grain model for describing system imple- 
mentations. See Sec. 5. This model represents atomic computation steps and 
can exhibit phenomena such as divergence and deadlock. Fine grain systems 
abstract to a medium-grain level where just the input/output and termination 
behaviour of systems is captured. The medium grain level is also appropriate 
for directly creating system specifications. See Sec. 3 for the medium grain sy- 
stem definition. This medium grain formalism uses precondition and transition 
relations and is very similar to the way systems are described in VDM [10], for 
example. 

Refines-to is defined only on medium grain systems. It is independent of how 
we characterise systems at the fine grain. For example, for the fine grain model 
we could have used instead a structured operational semantics that captures 
total correctness. One advantage of an automata-based approach to fine grain 
systems is that the characterisation of when systems terminate is direct and 
obviously correct. 

When showing that refines-to is a precongruence, we use a variation on fine- 
grain systems to construct the general class of environments that we show pre- 
congruence with respect to. See Sec. 6 for the definition of the variation and 
Sec. 7 for the precongruence proof. 


Evaluating Goodness of Refines-to. A precongruence property is generally de- 
sirable for any refinement relation, but isn’t sufficient by itself to justify the 
relation’s definition. To take an extreme example, an always true refinement re- 
lation is indeed a precongruence, it but provides no substitutivity guarantees at 
all. We also must look at the environment beyond the boundaries of the system 
we model formally, and consider the expectations this environment has. 

Sometimes, for example in the process algebra community, these expectations 
are formalized by developing a theory of testing [6] and showing (at least) that 
any more refined system passes all tests that a more abstract system passes. The 
hope is that it is more straightforward to agree that a testing theory adequately 
captures the expectations of an external environment than to agree that the 
refinement relation does. 

We haven’t developed a testing theory, and instead simply discuss the expec- 
tations we might reasonably have of sequential reactive systems. We do observe 
that the total-correctness proof obligations adopted in VDM for showing that 
one sequential program is a data reification of another are a consequence of 
our definition of refines-to. Also, it would be easy to derive the similar VDM 
obligations for showing an implementation meets a specification. 


Use of a Theorem Proving System. We see all of the Pvs [13] formalization work 
described in this paper as being necessary support material for case studies in 
verifying actual systems. 

It’s worth mentioning too that we also found the use of Pvs a significant help 
in clarifying what definitions were necessary, how lemmas should be phrased, and 
how proofs should go. At the same time, we found the main proofs sufficiently 
intricate and the weight in the formal notation sufficiently high that in many 
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cases it was necessary to be sketching proofs on paper before or at the same time 
as attempting the Pvs proofs. 


An Illustrative Example. We show in Sec. 8 a specification of an abstract data 
type of sets as a medium grain system, and an implementation as a fine grain 
system. 


2 Related Work 


The use of coinduction to define refinement relations has been made popular by 
the process algebra community [12]. 

Jacobs in [9] characterises classes in object-oriented languages as coalgebraic 
categories, and uses a coinductive notion of refinement to specify correctness of 
implementations. His approach is more general than ours in that he allows for 
changes in the system interface in going from abstract to concrete. However, 
he takes a simpler view of systems: he models them using total functions so 
non-determinacy is not possible, and he doesn’t take account of any input pre- 
conditions that might need to be satisfied for termination. We imagine it would 
be possible to adapt these extra features that we consider into his framework. 
This work is also being implemented in Pvs. 

We originally considered a coinductive definition of a refinement relation that 
allows contravariance on inputs after seeing Abramsky discuss such a relation [1]. 
The relation he considers is on labelled transition systems with input and output 
labels on the transitions. He also has a game-theoretic version that applies to 
prefix-closed sequences of input/output behaviour. One limitation of his relation 
is that it captures partial, not total correctness. 

A formalism for concurrent systems that allows contravariance on inputs and 
prevents restriction of environment behaviour by the system is that of alterna- 
ting refinement relations [3]. This work also uses coinductive characterisation of 
refinement. To tackle concurrency issues, its definition is more elaborate than 
ours. For example, the nesting depth of alternations of quantifiers is four, com- 
pared to 2 in our case. In the reactive modules [2] formalism being pursued by a 
subset of the authors of [3], a notion of temporal abstraction is defined, much like 
our map from fine to medium grain systems, that can hide internal computation 
steps of system components. 

In the literature on refinement of sequential programs (see [4] for a recent 
comprehensive survey), our approach is closest to that taken in VDM [10]. Our 
medium grain systems exactly correspond to the precondition and VDM post 
condition! style specifications. 

Early work of Milner [11] looks at denotational semantics for transducers 
which are effectively the same as our medium grain systems. To our knowledge, 
Milner never proposed a coinductive definition of refinement for transducers, 
though he deployed coinductive definitions heavily in his concurrency theory 


! relations on inputs and outputs, rather than just relations on outputs as in Hoare 
style specifications 
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work on labelled transition systems. There the distinction between inputs and 
outputs is erased at the level much of the semantics work is carried out, so the 
opportunity we take to treat them differently is lost. 


3 Medium Grain Systems 


A medium grain system is a full description of the behaviour of a non-determi- 
nistic sequential reactive system from an input/output and termination point of 
view. The intent is that both system specifications and implementations can be 
phrased as medium grain systems. 

The type Med_gs of medium grain systems , parameterized by types I and 0 
of input and output values and type Q of internal states, is defined as a subtype 
of a record type: 


Med_gs[Q,1I,0] : TYPE = 
{s : (pre C Q x I, 
trans CQ xIxoxQq ) 
| 
Vp,i. s.pre(p,i) = dgq,o. s.trans(p,i,o,q) }. 


The notation ‘fieldname C Type’ abbreviates ‘fieldname : P(Type)’ where 
P is the powerset (set of subsets) operator. Subsets of a type T are represented 
as functions of type T — bool, so membership of an element x in a subset s 
is expressed as function application s(x), a notation in keeping with the corre- 
spondence between subsets and predicates. 

The field trans specifies what transitions the system can make. The relation 
trans(p,i,o,q) indicates that, starting from state p and presented with input 
value i, it is possible for the internal computations of the system to eventually 
terminate in state q and for the system to return output value o. Because of 
non-determinism there might be more than one q and o for given p and i. The 
field pre specifies a precondition. The relation pre(p,i) indicates that starting 
from state p and presented with input i, the internal computations of the system 
are guaranteed to terminate in some state from which output is generated. In 
general it is not equivalent to dq,o. trans(p,i,o,q) but stronger. Even if a 
system can reach q and output o from a given state p and input i, because of 
non-determinism, it might also deadlock or go into a divergent computation. 

It would be convenient to include the type parameter Q as an initial field of 
the record type in the definition of Med_gs. However this is not possible in the 
Pvs specification language. 

To fully describe in Pvs a medium grain system, we sometimes augment the 
presentation of the system as an element of Med_gs by identifying some element 
of Q as the system’s initial state. 

We imagine interactions between an environment and a medium grain system 
as a continuing dialogue: if the thread of control is with the environment, the 
environment can choose to provide the system with some input. The system then 
processes this input and possibly eventually generates some output. Control then 
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passes back to the environment which is free is choose some further input for 
the system. 

For some purposes, we make the assumption that the environment has no 
ability to access or modify the internal system state. The environment might 
only know that the system initially started off in some well-characterized state. 

We imagine that a medium grain system being used as a specification will ex- 
hibit a range of possible behaviour on a given input that is only dependent on the 
initial state and the observed input/output behaviour inbetween. We consider 
a system with this property to be coarse grain. Coarse grainness is a desirable 
property for specifications. Coarse grainness corresponds to determinacy in the 
CCS process calculus [12]. We haven’t yet made any use of coarse grainness in 
our work. 


4 Definition of Refinement 


Our definition of what it means for one system to be a refinement of another is 
in the style of the coinductive definition of bisimulation [12]. 

Fix on an abstract medium grain system sa and a concrete medium grain 
system sc with distinct internal states Qa and Qc and both over input type I 
and output type 0: in Pvs, they have respective types Med_gs[Qa,1,0] and 
Med_gs[Qc,1,0]. 

ArelationR C Qa x Qc isa refinement relation from sa to sc iff it satisfies 


R(pa,pc) => (1) 
Vi. sa.pre(pa,i) > 
sc.pre(pc,i) 
A Vqc, 0. sc.trans(pc,i,o,qc) > 
dqa. sa.trans(pa,i,o,qa) A R(qa,qc) 


for any states pa and pc. 
System sa in initial state inita refines to system sc in initial state initc, 
written 


refines_to(sa,sc)(inita,initc), 


iff there exists a refinement relation R such that R(inita,initc). The relation 
refines_to(sa,sc) is easily shown itself to be a refinement relation, and so by 
this definition it is the greatest refinement relation, adopting the usual ordering 
of relations by inclusion. 

We realise the definition of refines_to in Pvs as the greatest fixed point 
of the appropriate functional. Pvs doesn’t provide direct support for such coin- 
ductive definitions. However, we easily prove a lattice-theoretic version of the 
‘Tarski-Knaster fixed-point theorem and specialise it for the creation of coinduc- 
tive definitions over the lattice of subsets of a type. 

Trivially we show that refines_to is a preorder, that is, it is reflexive and 
transitive. 
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Why is this definition plausible? Assume we have found a refinement relation 
R, system sa is in some state pa, system sc is in some state pc, and R(pa,pc) 
holds. We then know for a start that 


Vi. sa.pre(pa,i) = sc.pre(pc,i). (2) 


This is very reasonable: system sc is guaranteed to converge to an output on 
every input that sa converges on. System sc might converge also on other inputs 
too, but that doesn’t matter here. 

We also know 


Vi, qc, o. sa.pre(pa,i) A sc.trans(pc,i,o,qc) => (3) 
dqa : sa.trans(pa,i,o,qa) A R(qa,qc). 


Any output that the concrete system generates on an abstractly acceptable input 
is also an abstractly acceptable output. The concrete system’s output behaviour 
is always what we might expect. The concrete system might exhibit less non-de- 
terminism, completely in line with the approach in specification of introducing 
non-determinism, not because that non-determinism is expected in any one im- 
plementation, but in order to permit flexibility in implementation. However (2) 
guarantees that there always is some output that the concrete system genera- 
tes. Importantly too from (3) we know R(qa,qc) holds, so we also expect all 
subsequent I/O behaviour of the concrete system to be in accordance with the 
abstract system behaviour. 

We make the assumption above that an environment would never want to 
supply a system with input when there is not a firm expectation that the system 
will eventually generate some output given that input. We are not trying to 
define a notion of refinement that is to be used when thinking about the fault 
tolerance of systems or about systems that have divergent computations in the 
normal course of events. 

Having said that, this definition of refinement should also be applicable if 
only partial correctness were of interest. There is nothing intrinsic in the defi- 
nition itself that refers to total correctness. However for partial correctness one 
would want to discard the subtyping condition we have used in the definition of 
the Med_gs type that requires at least one output value to exist whenever the 
precondition is satisfied. 

A common approach to establishing a refinement relationship between an 
abstract and concrete system involves introducing a function rmap of type Qc 
—> Qa (sometimes known as a refinement mapping, abstraction map or retrieve 
function) and an invariant on concrete states c_inv C Qc. The function rmap 
and predicate c_inv define a refinement relation: 


R(pa,pc) : bool = c_inv(pc) A pa = rmap(pc). 


Specialising (1), a predicate stating that rmap and c_inv form a refinement 
relation is 


refmap_step(sa,sc,c_inv,rmap) : bool = 
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Vpc,i. 
c_inv(pc) A sa.pre(rmap(pc),i) => 
sc.pre(pc,i) 
A Vqe, o. sc.trans(pc,i,o,qc) => 
sa.trans(rmap(pc) ,i,o,rmap(qc)) A c_inv(qc), 


and the coinduction principle that goes with ref ines_to specialises to the theo- 
rem refines_to_ind_with_refmap_a: 


+ c_inv(initc) A inita = rmap(initc) 
A refmap_step(sa,sc,c_inv,rmap) 
=> 

refines_to(sa,sc) (inita,initc). 


We observe that the antecedents of this theorem are exactly a strict subset of 
the proof obligations in VDM [10] for establishing a data reification relationship 
between an abstract data type and its implementation when there is also an 
invariant on the concrete type. 

The extra proof obligation in [10] concerns adequacy. In our notation: 


Yqa. d qc. c_inv(qc) A qa = rmap(qc). 


This is usually desirable because it says that every abstract value has at least 
one concrete representation. However it is not necessary for showing 


refines_to(sa,sc) (inita,initc). 


If it so happens that in the abstract system sa starting from state inita we 
cannot access every abstract state by some sequence of input values, then ade- 
quacy needn’t hold for the inaccessible states. This is unlikely to happen if the 
abstract system is an initial specification, but it could reasonably happen if it is 
a system at some intermediate level of refinement. 

We also observe that if the preconditions sa.pre and sc.pre in refmap_step 
are always true, the theorem refines_to_ind_with_refmap_a reduces to the in- 
duction principle commonly used when refinement is defined as trace inclusion. 


5 Fine Grain Systems 


5.1 Definition of Fine Grain System 


A fine grain system is a system description that allows the presentation of the in- 
dividual computation steps that a system can perform. It is a suitable formalism 
for describing system implementations: see Sec. 8 where an example is given of 
describing the procedures in an imperative implementation of an abstract data 
type as a fine grain system. 

We build on the definition of a fine grain system when defining later the 
contexts or environments that some medium grain system may be operating in. 
See Sec. 6 and Sec. 7. 
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We define a type Fin_gs of fine grain systems as 


Fin-gs[Q,1,0] : TYPE = 

{s : (rum CQ, 
input : (9 x I) > Q, 
step C Q x Q, 
output : Q > O, 
wbehaved C Q ) 

| 
Vp,q. s.step(p,q) => s.run(p) } 


with parameters Q, I, and O as in the definition of the Med_gs type in Sec. 3. 

Initially a fine grain system is in some state for which run is false. When 
an input value is presented to a fine grain system, the system uses input to 
transition to a new state. The system then uses step to repeatedly make non- 
deterministic internal transitions. As specified by the subtyping predicate, steps 
can only be taken from states that satisfy run. The system halts and uses output 
to generate an output value if and when it reaches a state for which run is false. 

A system can deadlock, reach a state p for which run is true, but 7Hq. 
step(p,q). Deadlock might seem an unusual feature to have in a model of a se- 
quential system, but it is a natural phenomenon for guarded transition systems 
to exhibit. Deadlock is one appropriate behaviour for handling exceptional si- 
tuations without extra machinery in the formalism. And checks for its absence 
can reveal bugs in system descriptions. 

A system can also diverge, perform steps ad-infinitum without ever reaching 
a state in which run is false. 

Once halted, a system is then ready to be reactivated by a further input. 
The predicate wbehaved identifies those states from which any step by step is 
guaranteed to be well-behaved. A step might not be well-behaved if it involves 
interacting with a subsystem. 

As with medium grain systems, to fully specify a fine grain system we often 
also identify some element of Q as the system’s initial state. 


5.2 Abstraction from Fine to Medium Grain 


To form the input/output medium-grain view of a fine grain system s with type 
Fin_gs[Q,1,0], we use the fine to medium grain map: 


map_fm(s) : Med_gs[Q,I,0] = 
{ pre := mgs_pre(s), 
trans := mgs_trans(s) 


), 


where 


mgs_pre(s)(p,i) : bool = 
progressive?(s)(s.input(p,i)) 
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A inf_chain(s.step) (s.input(p,i)) 


mgs_trans(s)(p,i,o,q) : bool = 
star(s.step) (s.input(p,i),q) 
A -s.run(q) 
A o = s.output (q) 


at_progressive?(s)(q) : bool = 
s.run(q) => s.wbehaved(q) A dr. s.step(q,r) 


progressive?(s)(q) : bool = 
Vr. star(s.step)(q,r) => at_progressive?(s)(r). 


Here we draw on an auxiliary development of properties of finite and infinite 
sequences of values where adjacent values are related by a binary relation R. The 
relation star(R) is the reflexive transitive closure of R. The predicate instance 
inf chain(R) (x) indicates that there exists an infinite chain of R-linked values 
starting from x. If star(s.step) (q,r), then by the subtype property of fine 
grain systems, every state on any path from q up to but excluding r is a run 
state. A state is progressive? if every run state accessible by stepping through 
only run states is both well behaved and not deadlocked. The predicate name 
at_progressive? is an abbreviation for ‘atomically progressive?’. The predicate 
mgs.pre identifies exactly those inputs of the fine grain system for which no 
divergence is possible and it is guaranteed that the system will eventually reach 
a halting state. It might be difficult when reasoning with actual systems to 
work with this definition of mgs_pre, and simpler to use instead some predicate 
known to be stronger than that given here. The predicate mgs_trans specifies 
what outputs the fine grain system might generate for each input. 

With the typing of the map_fm definition, the Pvs type checker automatically 
generates a TCC (type correctness condition) that requires us to check that the 
subtype predicate in the Med_gs definition is satisfied. 


6 Parameterized Fine Grain Systems 


6.1 Definition of Parameterized Fine Grain System 


A parameterized fine grain system is an adaptation of a fine grain system that 
can feed inputs to and receive outputs from a medium grain subsystem. The 
use of it in this paper is as a general description of contexts that medium grain 
systems might operate in. The type of parameterized fine grain systems is: 


Prm.fin_gs[Q,I,0,Ix,0x] : TYPE = 
{s : ( run C Q, 
input : (9 x I) > Q, 
output : Q > QO, 
i_step CQ x Q, 
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x_en C Q, 
X_input : x_en — Ix, 
x_output : (Q x 0x) + Q ) 
| 
Vp. s.x_en(p) V (dq. s.i_step(p,q)) = s.run(p) }, 


where the type parameters for Prm_fin_gs are Q for internal states, I for values 
input from the environment, 0 for values output to environment, Ix for values 
fed to the medium grain subsystem, and Ox for values received back from the 
subsystem. The fields run, input and output are as for a fine grain system. The 
relation i_step is for internal steps, and the predicate x_en, function x_input 
and function x_output are for the interface to the subsystem. Their use will 
become clear in the next subsection. 


6.2 Instantiation of Parameterized Fine Grain System 


Here we show how to combine a parameterized fine grain system s with a medium 
grain system x to create an unparameterized fine grain system. There are several 
options as to how the internal state spaces of s and x are related. In general 
they might share some state and also each have some distinct private state. Our 
immediate interest is in the situation where the only interaction can be via x’s 
input/output interface, so we choose to keep the state spaces distinct. Let Q 
be the state space of s and Qx the state space of x. The type of states for the 
combined system isQ x Qx. 

Let the type parameters I, 0, Ix, and Ox be defined as in Sec. 6.1. The 
parameterised fine grain system s then has type Prm.fin_gs[Q,1I,0,Ix,0x] and 
the medium grain system x has type Med_gs [Qx,Ix,0x]. The map instantiating 
s with subsystem x has definition: 


m_ipfd(s, x) : Fin.gs[(Q x Qx),I,0]) = 


{run := XCq,qx). s.run(q), 
input := m_ipfd_input(s,x), 
step := m_ipfd_step(s,x), 


output := A(q,qx). s.output(q), 
wbehaved := m_ipfd_wbehaved(s,x) ), 


where 


m_ipfd_step(s, x) (ppx,qqx) : bool = 
let (p,px) = ppx, (q,qx) = qqx in 
s.i_step(p,q) A px = qx 
V s.x_en(p) A dox. x.trans(px, s.x_input(p), ox, qx) 
A q = s.x_output (p,ox) 


m_ipfd_input(s, x)(qqx,i) : Q x Qx = 
let (q,qx) = qqx in s.input(q,i), qx 
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m_ipfd_wbehaved(s, x)(q,qx) : bool = 
s.x_en(q) => x.pre(qx, s.x_input(q)). 


Here ‘m_ipfd’ stands for ‘map instantiating parameterised fine grain system 
keeping states distinct’. In the definition of m_ipfd_step, we see how x_en is 
used to identify when calls to the subsystem are enabled. In most sensible sy- 
stems, we expect that it will never be possible to take an i_step when x_en 
is true, but we haven’t found a need yet to specify this requirement. The fun- 
ction x_input is used to feed inputs to the subsystem, and function x_output 
processes the resulting outputs from the subsystem. 

When a system is modelled as the parameterized fine grain system s, we 
assume that the granularity of i-steps is chosen sufficiently finely that, within 
the system behaviour modelled by a single i_step, there is no possibility for 
divergence or deadlock. Therefore, in defining m_ipfd_wbehaved, we need only 
consider that bad behaviour of m_ipfd_step can result if x is called when x.pre 
is false. 

The map mm_map combines m_ipfd with the fine to medium grain map defined 
previously: 


mn_map (s : Prm-fin_gs[Q,1,0,Ix,0x])(x : Med_gs[Qx,Ix,0x]) 
: Med_gs[(Q x Qx),I,0] 
= map_fm(m_ipfd(s, x)). 


7 Refinement is Precongruence 


Let ps be a parameterized fine grain system of type Prm_fin_gs[Q,1,0,1Ix,0x] 
with initial state q, let sa be an abstract medium grain subsystem of type 
Prm.fin_gs({Qa,Ix,0x] with initial state qa, and let sc be a more concrete 
medium grain subsystem of type Prm_fin_gs[Qc,Ix,0x] with initial state qc. 
The lemma precong_lemma: 


F refines_to(sa,sc)(qa,qc) > 
refines_to(mm_map(ps) (sa), mm_map(ps) (sc)) ((q,qa), (q,qc)) 


states that the refines_to relation is a precongruence. 
The proof is by coinduction, using the ‘refines_to candidate’: 


rt_cand(sa,sc) (qqa,qqc) : bool = 
qga.1 = qqc.1 and refines_to(sa,sc) (qqa.2,qqc.2) 


to instantiate the coinduction lemma. Key foundational lemmas in the proof are 


F rt_cand(sa, sc)(qqa, qaqc) (4) 
A at_progressive? (m_ipfd(ps,sa)) (qqa) 
A m_ipfd(ps, sc).step(qqc, rrc) 
=> 
drra. m_ipfd(ps, sa).step(qqa, rra) 
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A rt_cand(sa, sc)(rra, rrc) 


fF rt_cand(sa,sc) (qqa,qqc) 
A at_progressive?(m_ipfd(ps, sa) ) (qqa) 
=> 
at_progressive?(m_ipfd(ps,sc)) (qqc). 


Key intermediary lemmas are 


F rt_cand(sa,sc) (qqa,qqc) 
A progressive? (m_ipfd(ps,sa) ) (qqa) 
A n—inf_chain(m_ipfd(ps,sa).step) (qqa) 
=> 
sinf_chain(m_ipfd(ps,sc).step) (qqc), 


proven by coinduction on inf_chain(m_ipfd(ps,sa).step), and 


F rt_cand(sa, sc) (qqa, qqc) 
A progressive? (m_ipfd(ps,sa) ) (qqa) 
A star(@m_ipfd(ps, sc).step)(qqc, rrc) 
=> 
drra. star(m_ipfd(ps, sa).step)(qqa, rra) 
A rt_cand(sa, sc)(rra, rrc), 


proven by induction on star(m_ipfd(ps, sc).step) using an inductive cha- 
racterisation of star(R) with R steps successively added on the left, and use of 
lemma (4) above. 


8 Example Specification 


We give here an example of a specification of an ADT (abstract data type) of 
finite sets as a medium grain system, and an implementation as a fine grain 
system. We show the correctness statement for the implementation in terms of 
our refines-to relation. 


8.1 Sets Specification 


We consider the ADT to be parameterised by a type T of elements, to have 
operators: 


bool empty © test if empty 

void insert (T) ansert a possibly new element 
void remove (T) remove an existing element 

T choose () choose an element 

bool member (T) test if an element is in the set 


and to have a constructor null for the empty set. 
We introduce datatypes for the input and outputs of both fine and medium 
grain systems. 
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IType [T:TYPE+] : DATATYPE 

BEGIN 
i_empty : i_empty? 
i_insert(i_insert_arg : T) : i_insert? 
i_remove(i_remove_arg : T) : i_remove? 
i_choose : i_choose? 
i_member(i_member_arg : T) : i_member? 

END IType 


OType ([T:TYPE+] : DATATYPE 
BEGIN 
o_empty(o_empty_val : bool) : o_empty? 
o_insert : o_insert? 
o_remove : o_remove? 
o_choose(o_choose_val : T) : o_choose? 
o_member(o_member_val : bool) : o_member? 
END OType 


Such datatype statements in Pvs declare constructors, recognisers, and field 
selectors, and introduce various auxiliary definitions and property axioms. 
The medium grain system for sets is: 


a_sys : Med_gs[AState,IType,0Type] = 
{ pre := a_pre, trans := a_trans ), 


where 


AState : TYPE = P(T) 


a_trans(p,ip,op,q) : bool = 
cases ip of 
i_empty : q=pA op = o_empty(empty?(p)), 
i_insert(x) : q = add(x,p) A op = o_insert, 
i_remove(x) : member(x,p) A 
q = remove(x,p) A op = o_remove, 
i_choose : nonempty? (p) A 
q = p A op = o_choose(choose(p)), 
i_member(x) : q = p A op = o_member (member (x,p)) 
endcases 


a_pre(p,ip) : bool = 

cases ip of 
j_empty : true, 
i_insert(x) : true, 
i_remove(x) : member(x,p), 
i_choose : nonempty? (p) , 
i_member(x) : true 

endcases. 
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An initial state for the empty set is: 
a_null : AState = emptyset[T]. 


Here we have employed definitions such as member, remove and emptyset from 
Pvs’s standard sets-as-predicates library. 

Note that we only specify that the remove operation be well behaved and 
terminate if it happens that that element we are trying to remove is indeed 
initially contained in the set. 

We introduce a non-deterministic choose operation to pick some element 
from a set. It is defined in terms of the choose function on sets, which in turn 
makes use of the Hilbert epsilon operator in Pvs’s type theory. No requirement 
is placed on the behaviour of choose if the set is empty. 


8.2 Sets Implementation 


We base our implementation on lists. We will require these lists to not contain 
duplicates when we come to proving the correctness statement we show in the 
next subsection. 

The type of states is: 


CFState : TYPE = 
{ pe : nat, 
sys_input : IType, 
set : list(T], 
tvar : T, 
tsvar : list[T], 
bvar : bool ) 


When executing an operation, we keep the input value to the fine grain system 
stored in the field sys_input. This field not only holds the input value (if any) 
of the operation, but also indicates which operation is currently executing. We 
use the predicate: 


at_proc(p : pred[IType])(u) : bool = p(u.sys_input) 


to indicate which procedure we are currently in. 

The field pc is the program counter, field set holds the list representation of 
the set, and fields tvar, tsvar, and bvar are temporary variables intended for 
use within operations. 

The system definition is: 


cf_sys : Fin_gs[CFState,IType,0Type] = 


( run := cf_run, 
input := cf_input, 
step := cf_step, 
output := cf_output, 


wbehaved := Au. true ). 
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The system is in a run state when the pc is non-zero: 

cf_run(u) : bool = u.pc > 0. 
Operations always start with a pc of 1: 

cf_input(u,ip) : CFState = u with [sys_input := ip, pe := 1]. 
The step relation is composed from step relations for each operation: 


cf_step(u,v) : bool = 
cases u.sys_input of 
i_empty : cf_empty_step(u,v), 
i_insert(x) : cf_insert_step(u,v), 
i_remove(x) : cf_remove_step(u,v), 


i_choose : cf£f_choose_step(u,v), 
i_member(x) : cf_member_step(u,v) 
endcases. 


Examples of step relations for operations are: 


cf_remove_step(u,v) : bool = 
cf_remove_step_i(u,v) 
cf_remove_step_2(u,v) 
cf_remove_step_3(u,v) 
cf_remove_step_4(u,v) 
cf_remove_step_5(u,v) 


<<<< 


cf_remove_step_2(u,v) : bool = 
at_pc(2)(u) A cons?(u.set) A v = u with [pc := 3] 


cf_remove_step_4(u,v) : bool = 
at_pc(3) (u) 
A cons? (u.set) 
A car(u.set) = i_remove_arg(u.sys_input) 
A v =u with [pe := 2, set := cdr(u.set)] 


cf_choose_step(u,v) : bool = 
at_pce(1) (u) 
A cons?(u.set) 
A v =u with [pe := 0, tvar := car(u.set)] 


Note how we implement choose by simply returning the head element of the 
list. 

Selector functions on Pvs datatype are partial functions, only total on the 
relevant subtype of the datatype. For example, above, remove_step_2 is the only 
way of reaching at_pc(3), but to make remove_step_4 type check in Pvs, we 
have to again check the cons?ness of u.set. We could regard the possibility of 
deadlock that this repeated check implies as a way of modelling the exception 
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that might be raised in actual code if we reached step 4 and u.set were not a 
cons. 
The output function is: 


cf_output(u) : OType = 
cases u.sys_input of 
i_empty : o_empty(null?(u.set)), 
i_insert(x) : o_insert, 
i_remove(x) : o_remove, 


i_choose : o_choose(u.tvar), 
i_member(x) : o_member(u.bvar) 
endcases. 


The correctness of the output function relies on the preservation of the sys_input 
field from when an operation is started. 
The initial state of the implementation is: 


cf_null : CFState = 


{ pe := 0, 
sys_input := i_empty, 
set := null(T], 
tvar := ex. true, 
tsvar := null[Tl, 
bvar := false }. 


The only important values here are for pc and set. The other values are just 
placeholders. 


8.3. Correctness Statement 
We construct a medium grain black-box abstraction of this system as follows: 


CMState : TYPE = CFState, 
cm_sys : Med.gs[CMState,IType,OType] = map_fm(cf_sys), 
cm_null : CMState = cf_null. 


The theorem that our implementation is a correct implementation of the sets 
specification is then: 


F refines_to(a_sys, cm_sys)(a_null, cm_null). 


9 Conclusions 


We have introduced a refinement relation refines-to that can be used in cor- 
rectness specifications for a very general class of programs. For example it is 
applicable to abstract data types, modules and classes in imperative and object 
oriented languages. The relation’s merits include 


| 


i 
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it captures total correctness requirements, 

it has a simple intuitive operational reading, 

it captures expectations about the covariant nature of outputs and the con- 
travariant nature of inputs under refinement, 

it is a precongruence with respect to a general class of environments, 
standard proof obligations used for refinement in VDM can be derived from 
it. 


While its use should make specifications significantly clearer than they might 


otherwise be, it doesn’t make verification tasks any easier. 


We are currently exploring the use of refines-to in specifying and verifying 


garbage collection algorithms. A refinement approach is appealing because it 
allows the use of black-box abstract data types for both the specification and 
implementation of garbage-collected heap memory. In previous work of ours in 
this area {8], we specified correctness using linear temporal logic assertions that 
had to refer to internal details of the garbage collection algorithm. 
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Abstract. Most industrial-size hardware verification problems are amenable to 
neither fully automated nor fully manual hardware verification methods. However, 
combinations of these two extremes, human-constructed proofs with automatically 
verified lower-level steps, seem to offer great promise. In this paper we discuss a 
formal verification case study based on such a combination of theorem-proving 
and model-checking techniques. The case study addresses the correctness of a 
floating-point divider unit of an Inte] [A-32 microprocessor. 

The verification was carried out in the Forte framework, which consists of a 
general-purpose theorem-prover, ThmTac, on top of a symbolic trajectory eva- 
luation based model-checking engine. The correctness of the circuit was formu- 
lated and decomposed to smaller, automatically model-checkable, statements in 
a pre/postcondition framework. The other key steps of the proof involved relating 
bit vectors to integer arithmetic and general arithmetic reasoning. 


1 Introduction 


The size and complexity of industrial-scale circuits means that they are rarely amenable 
to fully automated verification, as in the traditional model-checking paradigm. On the 
other hand, the amount of detail in these circuits puts them beyond the reach of purely 
human-constructed proofs, as in the theorem-proving paradigm. Consequently, a fair 
amount of recent research has concentrated on combining the two approaches. The goal 
is to automate tedious low-level reasoning, while retaining the freedom for the human 
verifier to set the overall proof verification strategy. 

In this paper we describe a verification of the input-output correctness of a floating- 
point divider unit from an Inte] IA-32 microprocessor. The verification is based on 
human-constructed proofs with automatically verified lower-level steps. It was carried 
out using the Forte verification system [1,2]. Forte is a combined model-checking and 
theorem-proving system that we have built on top of the Voss system [14]. The interface 
language to Voss is FL, a strongly-typed functional language in the ML family [20]. 
Model checking in Voss is done via symbolic trajectory evaluation (STE) [23]. Theorem 
proving is done in the ThmTac! proof tool. ThmTac is written in FL and is an LCF-style 
implementation of a higher-order classical logic. 

Since the widely publicized Pentium floating point erratum in 1995, a multitude 
of divider hardware verification case studies have been published [4,8,7,5,18,19,22]. 
Floating point dividers are particularly hard to verify due to the iterative nature of division 


' The name “ThmTac” comes from “theorems” and “tactics”. 


J. Harrison and M. Aagaard (Eds.): TPHOLs 2000, LNCS 1869, pp. 338-355, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 


Divider Circuit Verification with Model Checking and Theorem Proving 339 


algorithms, the use of multiplication in the natural high-level correctness statements, and 
the range of data. No currently known model-checking technique is capable of directly 
verifying floating point division algorithms against a high level correctness statement. 
Usually the top-level correctness statement is decomposed into small portions, which are 
then verified by automated model-checking. The reasoning that justifies the deduction 
of the high-level correctness statement is then done as a pen-and-paper proof or with a 
theorem-proving or proof-checking tool. 

We set out to perform a fully mechanized proof in a single, unified framework 
that would connect the top-level correctness statement all the way down to the actual 
register-transfer level description of the hardware. In some of the earlier case studies 
combining theorem-proving and model-checking techniques, model-checking is done in 
one system, and the results transferred to another system for theorem-proving purposes. 
In our opinion this approach still leaves room for error, as there may be unstated or poorly 
understood assumptions underlying the accuracy of translation of statements from one 
formalism and framework to another. A single, tightly integrated environment also helps 
in making the proof more manageable and understandable by allowing assumptions, 
qualifications and verified statements to be expressed in a uniform notation. 

Carrying out the entire proof in a single environment set certain requirements on the 
verification system. First, it must contain a sufficiently powerful model-checking engine. 
Secondly, as the decomposition proofs will unavoidably involve many different flavors 
of reasoning, the environment should contain a reasonably general theorem-prover and 
enable the user to write her own application-specific extensions. 

Although most of the discussion in the paper is applicable to all of the division-like 
operations supported by the hardware, we will concentrate on the (partial) remainder cal- 
culation in particular. According to the IEEE standard on floating-point arithmetic [15], 
the remainder operation is always expected to produce precise results, so this choice me- 
ans that the present paper does not need to address the separate and largely orthogonal 
issue of specifying and verifying floating point rounding. 

In Section 3 we review some basics of floating-point arithmetic, and examine a 
simple division algorithm and its implementation in hardware. In Section 4 we discuss 
the intuitive specification of the remainder operation and the main proof steps used in its 
verification. The following Sections 5, 6 and 7, deal with three general issues emerging 
in the proof: the principle of “proof by evaluation”, general arithmetic reasoning and 
the relation of bit-vector statements and arithmetical statements, and reasoning about 
flow of computation in a pre-postcondition paradigm. Section 8 then returns to the main 
verification, and gives a more detailed view of the required proof steps. 


2 The Forte System 


Forte [1] is a combined model-checking and theorem-proving system based on Voss [14]. 
The interface and scripting language to Voss is FL, a strongly-typed functional language 
in the ML family [20]. Model checking in Voss is done via symbolic trajectory evaluation. 
Theorem proving is done in the ThmTac proof tool. ThmTac is written in FL and is an 
LCF-style implementation of a higher-order classical logic. 
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The Forte scripting language, FL, includes binary decision diagrams (BDDs) as 
first-class objects and trajectory evaluation as a built-in function. The principle aim of 
ThmtTac is to enable seamless transitions between model checking, where we execute 
FL functions, and theorem proving, where we reason about the behavior of FL functions. 
This goal was achieved via a reflection-like mechanism named “lifted FL” [2]. In this 
section we give a brief overview of two of the underlying technologies in Forte: lifted 
FL and trajectory evaluation. 


2.1 Lifted FL 


Parsing an FL expression results in a conventional combinator graph [21] for evaluation 
purposes. Parsing a lifted FL expression results in both a combinator graph and an abstract 
syntax tree representing the text of the expression. The abstract syntax tree is available 
for FL functions to examine, manipulate, and evaluate. 

FL expressions are lifted by enclosing them in ‘, asin ‘1 + 2‘. Ifan FL expression 
has type a, the lifted version of that expression will have type a expr. An expression 
of type @ expr can be evaluated to an expression of type a using the built-in function 
eval. 

Forte uses lifted FL as the term language for ThmTac and FL as the specification 
language for model checking. Our link from theorem proving to model checking is 
via the evaluation of lifted FL expressions. Roughly speaking, any FL expression that 
evaluates to true can be turned into a theorem. 

Figure 1 shows how we move between standard evaluation (i.e., programming) and 
theorem proving. The left column illustrates lifting a Boolean expression. The right 
column illustrates evaluating a lifted expression and proving a theorem with evaluation. 


fl1>1+4-2> 2; fl> eval ‘1+ 4-2 > 2°‘; 
it :: bool it :: bool 
T T 
fl> ‘1+ 4-2> 2'; f1> Prove ‘1 + 4 - 2 > 2° Eval_tac; 
it :: bool expr it :: Theorem 
‘1 +4-2> 2! I- ‘1+4-2> 2° 
Lifting FL expressions Transition from evaluation to theorem proving 


Fig. 1. Evaluation and theorem proving in lifted FL 


To use lifted FL as the term language for theorem proving, we needed to add support 
for free variables and quantifiers. We supported free variables by modifying the FL parser 
to allow free variables in lifted FL expressions, but to complain about free variables in 
normal FL expressions. Evaluating a lifted FL expression that contains free variables 
will raise an exception. We implement quantifiers with regular FL functions that raise 
exceptions when evaluated and then axiomatize the behavior of the functions in ThmTac. 
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Symbolic trajectory evaluation is based on traditional notions of digital circuit simu- 
lation and excels at datapath verification. Trajectory evaluation correctness statements 
are called trajectory assertions and are written as: /-.,4 |ant===>cons]. The antecedent 
(ant) gives an initial state and input stimuli to the circuit ckt, while the consequent (cons) 
specifies the desired response of the circuit. Formally, ey {ant >cons] means: all 
sequences that are in the language of the circuit and that satisfy the antecedent will also 
satisfy the consequent. 

Two keys to the efficiency of trajectory evaluation are the restricted language of the 
temporal formulas and the built-in support for data abstraction via a lattice of simulation 
values. The core specification language for antecedents and consequences (trajectory 
formulas) is shown in figure 2. The specification language does not include negation and 
the only temporal operator is “next”. 


traj_ form = node is value 
| traj_form when guard 


| Ntraj_form VAN 
F 


| traj_form and traj_form a 


The meaning of: N° (node is value when guard) is: “if guard is \x% 
true then at time t, node has value value”; where node is a signal in 
the circuit and value and guard are Boolean expressions (BDDs). 


Fig. 3. The four 


Fig. 2. Trajectory formul.: 
8 a cai As valued lattice 


The simulation model for trajectory evaluation extends the conventional Boolean 
domain to a lattice. The theory of trajectory evaluation supports general lattices. However, 
for gate-level hardware verification, the four valued lattice shown in Figure 3, which is 
used by Forte, suffices. 

In conventional symbolic simulation, the value of a signal is either a scalar value (T 
or F) or a symbolic expression representing the conditions under which the signal is T. In 
trajectory evaluation, the value X denotes lack of information: the signal could be either 
T or F. Because of the restricted temporal logic of trajectory evaluation, if a trajectory 
assertion ([Eox¢ [ant===>cons]) holds when some signal has a value of X at some point 
in time, then the assertion will also hold when the signal has a value of either T or F. 
An essential result is that any assertion verified over a sequence containing Xs will hold 
for sequences with Xs replaced with either T or F [3,6]. It is important to note that the 
converse does not necessarily hold. That is, if a property holds both when a signal is T 
and when it is F, then it is not necessarily the case that the property will hold when the 
signal is X. 

Figure 4 is an example of how the use of X over-approximates the possible behaviors 
of a circuit. When a is either T or F, c is T. If a is a variable v, then c is T. However, 
when a is a X, we do not have any information about the value of b. In particular, we do 


342 R. Kaivola and M.D. Aagaard 


not know that b is the inverse of a. Hence, in the fourth line c cannot be anything except 
X. 

The last line demonstrates the affects of the top element of the lattice, T. The top 
element, T, describes conditions under which a node has conflicting values. These 
situations arise when the antecedent is inconsistent or when the value calculated by 
the circuit differs from that in the antecedent. Because T is the highest value in the 
lattice, a signal that is T satisfies all consequents. 


Simulation values 
a b c 


Assertion value 


Assertion 


Font [ais F==>c is T| 


Fext [ais T= >c is T] 


a b 
Pep 
Fekt (ais v= > is T] 


Kekt == >cist] 


pie ane — <p ase iE a 
aisF 


Fig. 4. Xs and approximation 


Assertion 


disF and 
==>fisF 
Fekt eisr ae 


Fekt disF==>f is F] 


Fckt disv==>f is F] 


Fig. 5. Examples of assertion results 


Figure 5 illustrates a variety of trajectory assertions about a simple AND gate and 
the resulting values of the assertions. Note in particular, the third line, which shows 
how the value of an assertion is symbolic if the assertion is only satisfied under certain 
circumstances. 


3. Divider Circuit 


The rest of the paper is dedicated to examining the application of the verification frame- 
work outlined above to a particular case study, a subcircuit of an Intel IA-32 micropro- 
cessor carrying out floating-point division and remainder calculation. 

Let us first briefly recall some basics of binary floating-point numbers, a binary repre- 
sentation for a subset of real numbers. A typical representation is a triple f = (s,e,m), 


Divider Circuit Verification with Model Checking and Theorem Proving 343 


where the sign s is a single bit, the exponent e a bit vector of some fixed length expin, 
the mantissa m another bit vector of some fixed length manin. The real number r(f) 


encoded by the triple is (—1)* «2°78! 4 7m*27""+1, where & is the natural number 
encoded by the bit vector x in the usual fashion and bias is some fixed exponent bias. 
Here the mantissa m has intuitively manin — 1 fraction bits and one bit to the left of the 
binary point, so m always encodes a value strictly less than 2. 

The IEEE standard [15] defines several different representations for floating-point 
numbers, differing on details, but all adhere to the general idea described above. The 
standard also defines special encodings for zeros, infinities and various other such ex- 
ceptional values, but in the current paper we do not need to be concerned with these. 
We call a floating-point number normal iff it is not one of these special cases and if the 
mantissa bit to the left of the binary point is 1, i.e. if m encodes a value that is at least 1. 

Since only a small subset of the reals are representable as floating-point numbers, 
not all results of arithmetic operations on floating-point numbers can necessarily be 
expressed precisely as floating-point numbers themselves. Therefore the IEEE standard 
defines the concept of rounding: determining which sufficiently close representable 
number should be used, if the accurate result is not representable. 


input: two normal floating-point numbers N = (Ns, Ne, Nm) and D = (Ds, De, Dm) 
(we view abstractly Ne and De as natural numbers and Nm and D,y, as fractions below) 
variables: floating-point numbers Q = (Qs, Qe,Qm) and R = (Rs, Re, Rm), integers imax and i 


i:=0; imaz := div_iteration_count; 
Qm(0] :=0; Rm[0]:= Nm; 
while i < imaz do 
/* determine quotient bit g; € {0,1} */ 
if Rm|i] < Dm then q; := 0 else qj :-= 1 fi 
/* update quotient and remainder accordingly */ 
Qmlit 1] := Qm[i]+27**¢i; Rm[itlj:=2*(Rm{i]-—q*Dm); i:=t4+1 
od 
Qs := Ns xor Ds; Qe:= Ne—De; Qm:= Qm[imaz]; 
Rs:= Ns; Re:=Ne—imaz; Rm:= Rmlimaa]; 


if DIV then output ( round(Qs,Qe, Qm) ); 
if REM then output (Rs, Re, Rm); 


Fig. 6. Simple iterative division-remainder algorithm 


To illustrate the principles of the approach we used for verifying the divider circuit, 
consider the simple iterative division-remainder algorithm sketched in Figure 6. It takes 
two normal floating-point numbers N and D as input, and produces either the rounded 
quotient Q or the remainder R of N divided by D. This algorithm is essentially the same 
as the one taught in school for pen-and-paper division, although in binary instead of 
decimal. The value of div_iteration_count depends on the required precision of result. 
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The primary purpose of the algorithm is division computation, and it could be argued 
that the remainder is simply a by-product of this. Nevertheless, here we concentrate on 
the remainder calculation, as this allows us to ignore the largely orthogonal issue of 
specifying formally what correct rounding means. The techniques discussed here are 
applicable to division, as well, so the choice is merely a matter of presentation. 

To be more accurate, the remainder operation is expected to produce a floating-point 
number W such that r(W) = r(N) — [r(N)/r(D)| *r(D), where |x| is the function 
rounding x down to the preceding integer for positive x, and up to the following integer for 
negative x. In other words, the operation should produce the remainder after computing 
an integer quotient, which corresponds to defining imaz as N, — D. +1 above. 


etrl fel datapath 


Fig. 7. Simple divider-remainder hardware 


To illustrate a hardware implementation of the division algorithm, Figure 7 depicts 
a simplified division circuit. The circuit has inputs for the dividend N and the divisor 
D, and it also has some control signal inputs, at least a ’start operation’ signal and 
signals specifying whether a division or remainder operation is to be performed. Mantissa 
calculation is done in a feedback loop, one iteration per clock cycle, and exponent 
calculation is done in a separate subunit. As output, the circuit produces the result W of 
the required calculation and some control information, such as various flags. 

Current industrial hardware implementations of division algorithms are many ma- 
gnitudes more complex than the simple one above. For example, they may use redundant 
or multiple representations of Q and R, produce more than one quotient bit per iteration, 
or perform speculative calculations, for purposes of optimizing the speed of the circuit 
(see [9] for various options). The circuit we verified was no exception in this respect: 
it contains over 7000 latches and a print-out of the register-transfer level description 
of the circuit is about one inch thick. Nevertheless, the principles of the algorithm and 
the hardware are similar to the simple case above, and the verification of the circuit is 
structured much the same way as for the simple case. 
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4 Overview of Verification 


At its face value, translating the correctness statement of the algorithm to the hardware 
implementation is easy: a natural formulation would be IF ‘start operation’ signal is 
asserted, AND the circuit is instructed to perform a remainder operation, AND the input 
values are N and D, THEN at the time the circuit produces output W, the equation 
r(W) = r(N)— [r(N)/r(D) | *r(D) holds. However, in the context of an actual micro- 
processor, this statement is overly optimistic and is unlikely to be true for several reasons. 
First, usually not all values of data are handled by hardware alone: to reduce the size of the 
circuit, atypical cases such as division by zero, tiny or huge results etc. are often handled 
partially by microcode routines. Secondly, the circuit is likely to function correctly only 
when started at a known, well-defined state. For example, before initialization of various 
internal control registers with suitable values has taken place, the circuit is unlikely to 
produce correct results, nor is it expected to do so. Thirdly, during the operation of the 
circuit it constantly interacts with its environment, and only if the environment behaves 
according to the protocol expected by the circuit, can the circuit itself function correc- 
tly. For example, as the exponent calculation and rounding take a proportionately much 
shorter time than the mantissa loop in the divider, it is advantageous to share these parts 
with other components performing other calculations in parallel. This means that the 
divider must negotiate their use with the other components. The integrity of the divider 
calculations depends on the assumption that, if the other components grant the divider 
access to a shared resource, they will not try to access it simultaneously themselves, thus 
possibly overwriting or corrupting data. 

Bearing these concerns in mind, an actual informal specification of the circuit’s 
functional correctness is more likely to read: 


IF the circuit is internally in a normal operating state, AND the environment 
behaves according to the expected protocol, AND ’start execution’ signal is 
asserted, AND the circuit is instructed to perform a remainder operation, AND 
the input values N and D are within the range handled by hardware, 

THEN at the time the circuit produces output W, the equation 
r(W) = r(N) — [r(N)/r(D)| *r(D) holds. 


In the context of the complete microprocessor we can strengthen this statement by 
proving separately that whenever ’start execution’ signal can be asserted, the divider 
circuit is internally in normal operating state, which allows us to discharge the first 
conjunct of the antecedent in the statement above. The proof can be carried out in a 
fairly traditional temporal-logic-based model-checking framework, but as it is largely 
separate from the main proof examined in the current paper, we shall not discuss it any 
further here. 

Let us then have a brief overview of the main steps in verifying this functional 
correctness statement. As the algorithm and the hardware are iterative in nature, it is 
reasonable to start by looking for a loop invariant for the mantissa calculation. At the high 
level there is a natural loop invariant that relates the quotient and remainder mantissas 
Qm|i] and R,,[t] to the input numbers D and N, derived from the fundamental defining 
equation of division: 


(Nm = Qm{i|* Dm +27** Rm [i]) A (Rmli] <2* Dm) 
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The multiplication operator in this high-level invariant means that verifying the property 
by direct model-checking is difficult. Hence, we decompose the problem by introducing 
two lower-level properties. The first is a bit-vector invariant that is optimized for mo- 
del checking efficiency. The second property is the recurrence relation that the loop is 
supposed to compute, i.e., an equation relating current and previous loop values. 

This divides the verification task into seven parts: 


A Use model checking to show that the circuit satisfies a low-level bit-vector invariant. 

B Prove that the low-level bit-vector invariant implies a numerical recurrence relation. 

C Prove that the numerical recurrence relation maintains a high-level invariant. 

D Prove that the high-level invariant guarantees that the final result emerging from the 
loop is the correct unrounded result. 

E Use model-checking to show that a correct bit-vector relation holds between the loop 
output and the final output emerging from the rounder. 

F Prove that the bit vector relation between loop output and final output implies a correct 
numerical relation between these. 

G Prove that the correctness of the loop output and the correct numerical relation bet- 
ween loop output and final output implies the top-level correctness statement. 


Different types of reasoning are required for the different parts: steps A and E involve 
only plain model-checking, steps B and F require reasoning about the correspondence 
between bit-vector operations and their arithmetic counterparts, step C relies on pure 
arithmetic reasoning, and steps D and G mainly apply reasoning about the flow of 
computation with some additional doses of arithmetic reasoning. 

In the following three Sections we first look at some general technical issues emerging 
in the proof: the concept of proof by evaluation, in particular in relation to quantified 
statements; the transition from bit vectors to true arithmetic; and the formulation of and 
reasoning about statements concerning flow of computation. In Section 8 we shall return 
to the main proof in more detail, and see how these general techniques fit in. 


5 Proof by Evaluation 


As described in Section 2.1, eval is an FL function of type a expr — a. Evaluation is 
available to the ThmTac user in rewriting and in tactic application. Eval_rw is a rewrite 
that evaluates a term and substitutes the result in for the original term (e.g., replace 
1+2 with 3). Eval_tac evaluates the goal of the sequent. It solves the goal if the goal 
evaluates to true and raises an exception otherwise. 

We have concentrated ThmTac on verifying implementation specific properties, that 
is, properties about specific circuits. This means that we tend to encounter many concrete 
values (e.g., a specific list of BDD variables in an STE run) and relatively few term varia- 
bles (e.g., a list variable @ of an unknown length). We originally intended for Eval_tac 
to be used just for carrying out symbolic trajectory evaluation runs. However, we have 
found that many subgoals that would normally require instantiating general theorems 
(e.g., properties of lists), are concrete and are most easily solved by evaluation. For ex- 
ample, proving [1, 2, 3}@[4, 5,6] = [1,2,3,4,5,6] could be done either by instantiating 
a theorem about the associativity of @ or by evaluating the goal. 
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One of the critical distinctions between theorem proving and BDD-based model 
checking is that in theorem proving, variables are terms, while in model checking, 
variables are BDDs. An important feature of ThmTac is the ability to use BDD evaluation 
as a means to prove theorems about Boolean variables. In an earlier paper [2], we briefly 
described our initial technique for moving between term and BDD variables. As we 
have continued to use ThmTac, our technique has evolved. In this section we provide a 
relatively detailed description of the enhanced techniques. 


Term 


variable in FL NONE 
variable in lifted FL] var "x" 
quantifier usage Ve. T=> 2 


Table 1. Variables and quantification for terms and BDDs 


variable "x" 
APPLY (LEAF variable) (LEAF (STRING "x")) 
Quant_forall ["x"] (T ==> (variable "x")) } 


Table 1 shows variables and quantification for terms and BDDs in FL and lifted FL. 
A term variable is a lifted FL construct that has no representation in “normal” FL. In FL, 
a BDD variable is created from a string using the function variable. In lifted FL a term 
variable is created with the VAR construct and a BDD variable is the application of the 
function variable to a string. Because Booleans are a finite domain, we can implement 
functions to quantify over Boolean variables and can translate term quantifiers over 
Boolean variables to BDD quantification. The universal BDD quantifier, Quant _forall1, 
is implemented as shown in Figure 8. 


let Quant_forall var body = 
bdd_substitute (var, T) body 
AND bdd_substitute (var, F) body; 


Fig. 8. Universal quantification over BDDs 


Transforming an expression from term quantifiers and term variables to BDD quan- 
tifiers and BDD variables is complicated by both performance and soundness issues. A 
naive, but sound, method would be to have ThmTac provide a new and unique name 
for each BDD variable. This is infeasible for performance reasons though, because the 
variable would not be placed in an optimal location in the all-important BDD-variable 
order defined by the user. Additionally, increasing the number of BDD variables slows 
down some BDD operations. Thus, for performance reasons, we needed to allow users 
to specify the mapping from term variables to BDD variables. 

However, giving users complete freedom to choose BDD variable names would intro- 
duce soundness problems. Users could inadvertently prove contradictions if they chose 
BDD variables that were already used within the proof. Large verifications typically 
use hundreds of different BDD variables, which makes it relatively easy to lose track of 
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where each variable is used. Thus, for correctness purposes, we have given ThmTac the 
burden of ensuring that users provide valid mappings of term variables to BDD variables. 

Replacing term quantifiers and variables with BDD quantifiers and variables is im- 
plemented by the rewrite Term2BDD. Users provide a variable mapping to Term2BDD 
that specifies the BDD variables to use in place of each term variable. The term variables 
are allowed to be of type bool or bool list. 

In order to rewrite Vx.P with Term2BDD [("x", vs)], where z may be free in P 
and where vs is of type string list, the following conditions must hold: 


1. x must be of type bool or bool List. 
2. If x is of type bool, then vs must be a singleton list. 
3. The names in vs must be disjoint from the BDD variables that appear in P. 


If these conditions are met, then Term2BDD: 


1. replaces the term quantifier Va with the BDD quantifier Quant_forall vs 
2. substitutes all occurrences of VAR "x" in P withmap variable vs 


One subtlety in the above conditions is that term quantification over a list is for 
lists of all lengths, while BDD quantification is for fixed-length lists. To be completely 
rigorous, we should require that each term quantifier restrict the bound variable to a 
specific list, such as: Vx. (length x = 32) => P. We plan to address this shortcoming 
in the near future. So far, we have not encountered any theorems in our normal work that 
would be true if instantiated with a list of incorrect length and false if presented with a 
list of correct length. 

Gordon [10] has an alternative method for transforming from terms to BDDs in HOL. 
The principle distinction between his work and ours is that his transformations go from 
a purely term world to a purely BDD world. In HOL, term expressions do not contain 
BDD variables. In contrast, a lifted FL expression might contain BDD variables, and so 
we require an additional safety check before carrying out the transformation. 


6 Arithmetic Reasoning 


The top-level input-output correctness statement and the high-level loop invariant of the 
divider are naturally expressed in terms of mathematical entities and operations. Much 
of the reasoning related to the preservation of the high-level invariant is also most natu- 
rally expressed in terms of arithmetics and general arithmetical rules. On the other hand, 
model-checking techniques using symbolic values and BDD-based representations can- 
not deal with statements involving integer or real operations. Instead, the model-checked 
statements need to be expressed in terms of bit-vectors and bit-vector arithmetics. 
Bridging the semantic gap between bit vectors and numbers brings about several 
issues. Assume for the moment being that we are dealing with natural numbers on the 
abstract level, and n-bit vectors on the concrete level, and consider a simple combi- 
national circuit with two n-bit inputs a and 6, and one n-bit output c. If the circuit is 
expected to compute the numeric operation @ (e.g. addition, subtraction etc), its natural 
correctness statement would be expressed by ¢ = aéb, where, as before, # denotes the 
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usual conversion of a bit vector to a natural number. For model-checking, we would 
reformulate this statement as c = a@b, where @ is the bit-vector counterpart of &. 
Deducing the natural correctness statement from the model-checkable one depends then 
on a general correspondence theorem relating @ and @: 


Va.Vy. c@y = 209 


Effectively this states that the concrete bit-vector operation implements correctly the 
abstract mathematical operation. 

However, the general correspondence theorem is unlikely to be universally true. Due 
to the representation of values by n bits, it is far more likely that the equality only 
holds modulo 2”; if the bit-vector operation @ wraps over, it no longer corresponds 
directly to @, which causes various complications. First, reasoning in arithmetic modulo 
2” is harder and requires more care than in ordinary integer arithmetic. Secondly, as 
the main correctness statement is still formulated in ordinary arithmetic, we somehow 
must regain that from arithmetic modulo 2". This means that usually we must keep 
track of side conditions guaranteeing that the bit-vector operation does not wrap over. 
In simple cases these side conditions can be manageable, although an extra burden, but 
in more involved cases, with nested expressions, keeping track of all the necessary side 
conditions becomes extremely cumbersome. 

To alleviate this problem, we wrote a library of routines for bit-vector integer arith- 
metic operations in which each vector and operation is augmented with an extra flag-bit 
detecting wrap-around, loss of precision or other such event. So, instead of just bit vec- 
tors, the basic entities handled by the new library are (bit vector, exactness bit) pairs. 
Using these routines, a user can build expressions in bit-vector arithmetics in the usual 
fashion, and by checking the value of the exactness bit determine whether the bit-vector 
operation corresponds to its integer arithmetic counterpart. 

For example, the addition operation is this library is defined as shown in Figure 9, 
where ADD_bv_bv is the plain bit-vector addition operation, and msb a function returning 
the most significant bit of its argument. Intuitively the definition states that the result 
is exact iff both of the operands are exact and no wrap-around occurs. Other supported 
operations are subtraction, negation, multiplication by a power of 2, modulus by a power 
of 2, and equality and magnitude comparisons. 


let +@ (bvi,el) (bv2,e2) = 
let bv = ADD_bv_bv bvi bv2 in 
let e = e1 AND e2 AND 
((msb bvi != msb bv2) OR (msb bvi = msb bv)) 
in 
(bv, e); 


Fig. 9. Bit-vector addition with exactness test 


For each of these operations the library contains a formally derived correspondence 
theorem. For example, for the addition operation above, this theorem states that for all 
bv1, el, bv2, e2 and bu, if (bv1, e1)+@(buv2,e2) = (bv, T), then bul + bv2 = bv. where 
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+ is regular integer addition. Using the theorem, the user can lift model-checked results 
stated in terms of +@ to statements formulated in terms of + without manually keeping 
track of the side conditions. Analogous correspondence theorems exist for the other 
operations. 

Once the model-checked results have been lifted to the level of integer arithmetics, 
all the usual machinery for reasoning about them is at our disposal: laws of associativity, 
distributivity, cancellation etc, various rewriting rules for simplification of expressions, 
and so on. This part of the verification was a fairly straightforward task, given the 
relatively well-developed support for arithmetic reasoning in ThmTac. Notice, however, 
that all the proofs were carried out in terms of integers and not real numbers, although 
the latter would be the most natural choice for reasoning about floating-point operations. 
Ideally we would like to operate in terms of real number arithmetic as e.g. in [12], but 
as support libraries for this are not in place in ThmTac yet, integers were chosen as a 
pragmatic compromise. 


7 Reasoning about Flow of Computation 


In the arithmetic reasoning described in previous section, we are relating statements 
about bit-vector arithmetic and logical relations to statements about relations between 
integers. This is a mathematically well-understood area, and it is easy to find the right 
conceptual level for the proofs. 

However, when we are deducing the correctness of the loop output from the preser- 
vation of the loop invariant we are reasoning about the temporal progress of computation 
in the circuit. Here it is not quite as clear what the appropriate framework for expressing 
the proof is, and what general principles of reasoning are at stake. To bring in some con- 
ceptual machinery to structure the proof, we chose to formulate these temporal aspects 
of the proof in a variant of the traditional pre-postcondition framework. 

The theory of pre-postcondition triples is a standard framework for specification and 
pen-and-paper verification of traditional sequential programs (see [11,17] for introduc- 
tion). In this approach, statements about programs are of the form {P}S{Q}, where P 
and @ are logical properties, and S is a program. Such a triple formalizes the statement 
precondition P guarantees postcondition Q after running S, or more accurately for any 
possible execution of program S, if the execution starts in a situation satisfying P, then 
it terminates in a finite time and leads to a situation satisfying Q. 

To relate the pre-postcondition approach to circuits, consider a circuit ckt, and assume 
trajectory tr;n(x) binds a vector x of Booleans to input signals at the time the input 
is intuitively read, and that a vector y is similarly bound to some output signals by 
trajectory trour(y). If a formula ¢;,,(x) expresses the precondition the input is supposed 
to meet, and dou4(Z, y) the postcondition the circuit is supposed to produce, the statement 
precondition in guarantees that the postcondition dour can be expressed by the formula 


Vin.din(in) => (Jout. (Fog [trin (in) => tr out (out)])) 
(Vout. (Fg [trin (in) == >trout(out)]) > — in, out)) 


In the following, we write {@in }(trin, ckt, trout) {Gout} as a shorthand for this. 
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To see that this formula indeed captures the intuition described above, recall from 
Section 2 that if the expression Fog [trin()==>trour(y)] is true, where x and y are 
Boolean vectors of appropriate lengths, then for every execution e of the circuit ckt, or 
every sequence in the language of the circuit, if tr;,(x) is true of e, then so is trout(y)- 
So, the formula states that for any vector x satisfying the precondition ¢in(2), 


1 there is some output vector y such that for every execution e, if trin (x) is true of e, 
then so is trour(y), and 


2 for every vector y for which 1 holds, the postcondition property dout(z, y) holds. 


Notice that 2 alone does not suffice, since if 1 fails, 2 would hold vacuously for all con- 
ditions ¢o,+. Our formulation is slightly different from that of [13], although equivalent 
under natural assumptions. The current formulation makes it easier to derive some of 
the reasoning rules discussed below, in particular the postcondition conjunction rule. 
Given a pre-postcondition statement in the form above, we can, in principle, compute 
the validity of it directly using Forte: the universal and existential quantifications can 
be replaced by BDD-quantification over symbolic values as explained in Section 5, 
the validity of Foxy [rin (in) >t? out (out)] can be evaluated by STE using symbolic 
values, and the rest just consists of evaluation of standard logical operations, again 
using symbolic values. Whether direct evaluation is feasible in practice depends on 
the formulae ¢;,, and ¢o,:, the computation the circuit performs on the input values, 
and whether these are efficiently representable using BDDs. In reality, we can directly 
compute the validity of a pre-postcondition statement only in very limited circumstances, 
and we need inference rules to combine model-checkable statements to larger ones. 


As a part of our verification infrastructure, we wrote a set of FL routines which 
allow the user to work directly on the pre-postcondition level and abstract away from 
the details of evaluation. In addition to arguments corresponding to ¢in, Gout tin, ckt 
and trouz, these routines have some additional parameters, such as STE weakening lists, 
which are used to guide the evaluation without affecting the semantics. 

The main reason for introducing the pre-postcondition framework was to enable 
reasoning about the flow of computation in a well-structured manner. To this purpose, 
we defined and proved a set of general reasoning rules for pre-postcondition statements 
(Figure 10). These rules are closely related to the ones commonly used for traditional 
sequential programs. All these reasoning rules were formally derived from a set of 
simple axioms regarding STE [13] with some general first-order logic reasoning. The 
most conspicuous absence from this list is a general proof rule for iteration: as in our 
case all loops have a fixed upper bound for number of iterations, the weaker bounded 
induction rule below suffices. 

Since the formulae in the pre-postcondition statements are parametrized with the 
vectors zn and out, we need some additional notation to express the rules succinctly. If 
f and g are two-argument functions, we write f A” g for Az.Ay. f(x,y) A g(x, y), and 
analogously for other propositional connectives. If f is a one-argument function, we 
write f’ for the function Ax.Ay. f(y). 
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{in }(trin ,ckt, trout) {out } Vr.Win (x) > bin (x) 


Precondition 

strengthening {Win }(trin, ckt, trout) {Pout } 

Postcondition Ain} (trin ckt trout){Pout} __Va.Wy-bout(#,y) = Pout (2,y) 

weakening: {din} (trin, ckt, trout) {Wout} 

Pre- to Adin }Htrin cht, trout){dout} _Va-Wy-Pin(2) A out(t,y) € Bout(t¥) 

postcondition {din }(trin, ckt, trout) {Pout } 

transfer: 

Postcondition {¢in}(trin, ckt, trout) {Gouti } {¢in}(trin, ckt, trout) {¢out2} 
SSN en ee ee 

conjunction: {in }(trin, ckt, trout) {dout1\” Pout2} 

composition: {din }(trin, ckt, trout) {Pout } 


Vi.(O <i< n) => {¢;} (tri, ckt, trig1){; i} 
{¢o0}(tro ,ckt,trn) {on} 


Fig. 10. Pre-post condition inference rules 


Bounded induction: 


8 Main Proof 


Returning to the main verification task, let us now see how the technical machinery 
discussed in previous Sections can be used to formulate key steps of the proof. 

We need to address first a general pragmatic issue: proliferation of quantifiers. When 
introducing the pre-postcondition formalism above, we were discussing circuits as if 
they had single input and outputs. In reality, though, we are dealing with verification 
tasks with tens of input and output vectors. If we introduce a separate term for each vector 
the number of quantified terms quickly becomes unmanageable. Consequently, we took 
another approach, and used a single term, denoting a very long vector, for quantification, 
and access routines for extracting slices out of that. For example, instead of writing 
Va.Vy.Vz.(x,y,z), where each of x, y and z ranges over bit vectors of length 16, we 
would write Vw.d(2(w), y(w), z(w)), where w ranges over all bit-vectors of length 48, 
and x, y and z are routines extracting the first, middle or last 16 bits out of their argument, 
respectively. This is similar to techniques used by Joyce [16] and Windley [24], where 
a representation variable, typically denoting the state of the system being verified, is 
threaded through a verification. In the following definitions, let D,, De, Dm, Ns, Ne, 
Nin Ws,We, Wm, Rm; Qm and ctl be functions extracting disjoint slices of appropriate 
lengths out of an argument vector, define D(x) = D,(x)@D,(r)@D,, (x) where @ is 
the append-operation, and define N(x) and W(z) similarly. 

We define the input-output correctness relation OUT in terms of integer operations 
as follows: 


OUT (z,y) Sap 3Q.(ri(N(x)) = Q¥ ri(D(x)) +ri(W(y))) A |ri(W(y))| < |ri(D(z))| 
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where the floating-point to integer conversion is: ri((s,e,™m)) = (—1)* * 2° *m. Now 
the top-level correctness statement (goal of proof step G) can be formalized as 


{IN} (tin, ckt,tout){OUT} 
where 


IN(x) =g¢normal(N(x)) A normal(D(x)) A in-range(N(x),D(x)) A 
remainder _op_executed(ctl(x)) A 


internal_state_ok(ctl(x)) A environment_ok(ctl(x)) 


where the intuitive meanings of the conjuncts of JN correspond to their names, and 
where tin(zx) is a trajectory binding N(x) and D(x) to the corresponding input signals 
and ctl(x) to relevant control signals at the time the operation is started, and tout(:) 
is a trajectory binding W(z) to the output signals at the time the circuit is expected to 
produce output. 

When expressed in terms of integer operations, the high-level mathematical loop 
invariant MI, the recurrence relation MR the mantissa loop is expected to compute, 
and the relation LO between loop output and final output can be defined as follows: 


MI,(2) Sa (Nm (x) «2! = Qma (2) * Dn (x) + 270""-* & Roa(Z)) A 
(Rin (2) <2* Dyn()) 
MRi(2,y) =a (Qm(Y) = Qm(Z) A Rm(y) = 2*Rm(2)) V 
(Qm(¥) = Qm(z) +2"? A Ria(y) = 2% (Rm(2) — Dm(2))) 


LO(z,y) =a Ws(y) = Ns(z) A Wely) = De(z) A Wnly) = Rm(2) 
The lower-level bit vector invariant BI can be expressed as 
BI,(x) =gq IN(x) A loop_datain_range(x) A loop-data_consistent(x) 


The single most complex issue in the entire verification is determining the invariant BI 
exactly: some parts of it are easy, like the expected ranges of data values in the loop, but 
some are extremely low-level and implementation-dependent. In the actual verification 
the formula loop_.data_in_range has four conjuncts and loop_data_consistent seven, 
but nailing down the last few of these precisely took several weeks. The bit-vector 
recurrence relation B R; is simply the bit-vector counterpart of M R;, using the exactness- 
checking operations discussed in Section 6. 

Let then tloop;(x) be a trajectory binding R,,,(x), Qm(x) and other data items 
manipulated by the mantissa loop to the corresponding signals at the loop at the time 
iteration i is being performed, and define tl;(x) =gp tin(x) and tloop,(x). In terms of 
these definitions, proof step A consists of model-checking the following statements: 


{IN}(tin, ckt,tlo){BI§ A” MIG} 
VO <i<imaz. {BI,}(tl,ckt,th.1){BI,, A” BRi} 
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Due to the fact that BI really is a loop invariant, no major performance issues arose 
in the model-checking; the largest BDDs involved in the verification had only about 20 
million nodes, using a rather obvious variable ordering. 

Proof steps B and C then consist of proving the following statements: 


Va Ny. B(x) A BR;{2,y) > MR;(z,y) 
Va.Vy.MIj(2) AM R(2,y) > Miizily) 


Since BR; is defined as the bit-vector counterpart of MJ, the first of these is easy, and 
the second involves some routine arithmetic reasoning. Now the loop output correctness 
statement (goal of proof step D) can be formulated as 


{IN}{(tin, ckt, tlimaz){Blimaz \"” Mlimax} 
and proved from A, B and C by pre-postcondition reasoning. 

Since the remainder operation does not involve any rounding, proof steps &' and 
F’, verifying the correctness of the rounder, are easy. Both can be formulated by the 


statement 
(Bi ia } (tlimaz ’ ckt, tout) { LO} 


In operations with rounding, the goal of step E would consist of a bit-vector level 
specification and F' of a mathematical specification of correct rounding. Finally, the 
main proof goal G can be derived from D and F by sequential composition, and some 
straightforward arithmetic reasoning. 

One detail that has been glossed over in the description of the verification above is 
that the value of imaz varies according to the input values. This means that we have to 
show the statements above for all the potential values of imaz, under the assumption 
that imax = N,(in) — D.(in) + 1. As there are only a restricted range values handled 
by the hardware, this can be done by enumeration. In practice large parts of the proof 
are independent of imaz and only need to be verified once. 


9 Conclusion 


We have described a formal verification case study of a floating-point divider circuit, 
using a combination of theorem-proving and model-checking techniques. To our kno- 
wledge, this is currently one of the most complex floating-point circuits that has been 
formally verified; the verification described here took about eight person-months. The 
advantages of the chosen approach were the safety of a mechanically verified proof 
combined with the freedom of complete control over the proof approach and details, 
which allowed us to use a wide variety of technical and conceptual machinery to tackle 
the complexity of the verification. 
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Abstract. Over the last decade, the increasing demand for the valida- 
tion of safety critical systems has led to the development of domain- 
specific programming languages (e.g. synchronous languages) and auto- 
matic verification tools (e.g. model checkers). Conventionally, the verifi- 
cation of a reactive system is implemented by specifying a discrete model 
of the system (i.e. a finite-state machine) and then checking this model 
against temporal properties (e.g. using an automata-based tool). We in- 
vestigate the use of a synchronous programming language, SIGNAL, and 
of a proof assistant, Coq, for the specification and the verification of 
co-inductive properties of the well-known steam-boiler problem. 

By way of this large-scale case-study, the SIGNAL-Coq formal approach, 
i.e. the combined use of SIGNAL and Coa, is demonstrated to be a well- 
suited and practical approach for the validation of reactive systems. In- 
deed, the deterministic model of concurrency of SIGNAL, for specifying 
systems, together with the unparalleled expressive power of the Coq 
proof assistant, for verifying properties, enables to disregard any com- 
promise incurred by any limitation of either the specification and the 
verification tools. 

Keywords: synchronous programming, theorem proving, the steam- 
boiler problem. 


1 Introduction 


In recent years, the verification of safety critical systems has become an area 
of increasing importance for the development of softwares in sensitive fields: 
medicine, telecommunication, transportation, energy. 

The notion of reactive system has emerged to focus on the issues related 
to the control of interaction and of response-time in mission-critical systems. 
This has led to the development of specific programming languages and related 
verification tools for reactive systems. 

Conventionally, the verification of a reactive system is implemented by, first, 
elaborating a discrete model of the system (i.e. an approximation of its behaviour 
by a finite-state machine) specified in a dedicated language (e.g. a synchronous 
programming language) and, then, by checking a property against the model 
(ie. model checking). 

Synchronous languages (such as ESTEREL [5], LUSTRE [9], SIGNAL [4], STATE- 
CHARTS [10]) have proved to be well adapted to the verification of safety and 
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liveness properties of reactive systems. For instance, model checking has been 
used at an industrial scale on SIGNAL programs to check properties such as 
liveness, invariance, reachability and attractivity. 

Whereas model checking efficiently decides discrete properties of finite state 
systems, the use of formal proof systems enables to prove numerical and para- 
meterized properties about infinite state systems. Using a proof system, we can 
not only prove the safety and liveness of a reactive system but also its correctness 
and completeness. 

Such a proof is of course not automatic and requires interaction with he 
user to direct its strategy. The prover can nonetheless automate he most tedious 
and mechanical parts of the proof. In general, formal roofs of programs are 
difficult and time-consuming. In the very case of modeling a reactive system using 
a declarative synchronous language, however, this difficulty is milded thanks 
to the elegant stylistic combination of declarative programming and relational 
modeling. 

We investigate the combined use of the synchronous language SIGNAL and of 
the proof assistant Coq for specifying and verifying properties of a large-scale 
case study, namely, the steam-boiler problem. 


2 The Signal-Coq Formal Approach 


Synchronous languages assume that computation takes no time (this is the so- 
called “synchronous hypothesis”). Actually, this means that the duration of com- 
putations is negligible in comparison to the time of reaction of the system. This 
synchronous hypothesis is particularly well adapted to verify safety and some 
forms of liveness properties. SIGNAL is a synchronous, declarative, data-flow ori- 
ented programming language. It is built around a simple paradigm: a process is 
a system of equations on signals; and a minimal kernel of primitive operators. 
A signal represents an infinite flow of data. At every instant, it can be absent 
or present with a value. The instants when values are present are determined by 
its associated clock. Interested reader may find more about SIGNAL in [4]. 

CoQ [7] is a proof assistant for higher-order logic. It allows the development 
of computer programs that are consistent with their formal specification. The 
logical language used in Coa is a variety of type theory, the Calculus of Inductive 
Constructions [15]. It has been extended with co-inductive types (types defined 
as greatest fixed points rather than as least fixed points [8]) to handle infinite 
objects, It is thus well suited to represent signals. 

In [14], we have introduced a co-inductive semantics for the kernel of the 
language SIGNAL and formalized it in the proof assistant Coq. In this section, 
we summarize the CoQ definitions given for the primitive operators of SIGNAL. 
Interested reader may find the generalization to the complete language in [13]. 

A signal X is defined as a stream of | and values v. Let D be a set of values. 
The set of signals Sp is the largest set such that: 


Sp ={L.X |X € Sp}U{v.X |ve D,X € Sp} 
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Instantaneous Relation. The relation R% is used in SIGNAL to specify an in- 
stantaneous relation between n signals. At each instant, these signals satisfy the 
predicate P. In Coq, according to the Curry-Howard isomorphism, a pair proof- 
specification is represented by a pair term-type. The type of non-well-founded 
proofs of R% is introduced as a co-inductive type. Co-induction is needed to deal 
with infinite signals. For instance, RZ is introduced as follows: 


CoInductive Relation2 [U,V:Set; P:U->V->Prop] 
(Signal U)->(Signal V)->Prop := 
relation2_a: (X: (Signal U))(Y: (Signal V)) 
(Relation2 P X Y)-> 
(Relation2 P (Cons (absent U) X) (Cons (absent V) Y)) 
| relation2_p: (X:(Signal U))(Y: (Signal V))(u:U) (v:V) 
(P u v)->(Relation2 P X Y)-> 
(Relation2 P (Cons (present u) X) (Cons (present v) Y)). 


Down-Sampling. The SIGNAL equation Z := X When Y states that the signal 
Z down-samples X when X is present and when Y is present with the value 
true. When is the least fixpoint of the following continuous functional: 


(L.X,L.Y)-5 L.f(X,Y) 
(1.X,6.Y) + L.f(X,Y) 
Pwihen) =def (w.X, LY) b> Lif (X LY) 
(v.X, false.Y) —> 1. f(X,Y) 
(u.X, true.Y)+-> v.f(X,Y) 
Deterministic Merge. The SIGNAL equation Z := X Default Y states that X 
and Y are merged in Z with the priority to X. Default is the least fixpoint of 
the following continuous functional: 


(1.X, LY) 9 L.f(X,Y) 
LX ,u.Y S(X,Y 
F Default (f) Seer a ee i ae 
(u.X,v.Y)H> uf (X,Y) 


Delay. The SIGNAL function Pre is used to access to the previous value of a 
signal. Pre is the least fixpoint of the following continuous functional: 


u, LX fu, 
Fpre(f) —ser hee = ines 


Using the previously defined denotations of primitive processes, we can derive 
the denotations of the derived operators of SIGNAL. The parallel composition is 
denoted by the logical and of the underlying logic and the introduction of local 
signals is denoted by an existential quantifier. 

This co-inductive trace semantics of SIGNAL has been implemented with the 
proof assistant Coq (see [12] for details). Many lemmas are proved to ease the 
correctness proof of a reactive system specified with SIGNAL. The case study 
introduced in this paper confirms that our co-inductive approach is a natural, 
simple and efficient way to prove correctness of reactive systems. 
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3 Steam-Boiler Control Specification Problem 


In order to compare the strengths and weaknesses of different design formalisms 
for reactive systems, the steam-boiler case study has been suggested by J.-R. 
Abrial, E. Borger and H. Langmaack. In this section, we briefly recall its original 
specification (see [2] for more details), and the additional precisions we bring 
(see [11] for more details). 


3.1 Physical Environment 


The physical environment is composed of several units (Fig. 1). Each one is 
characterized by physical constants and some of them provide data. 


W : maximal outcome of steam 
U,: maximum gradient of increase 
U,: maximum gradient of decrease 


steam measurement device 
provided data: outcome of steam 


steam boiler M, 


maximal capacity: C 
minimal limit: M, 
maximal limit: M, 
minimal normal: N, 


maximal normal Ny 


pumps 
capacity: P 
provided data: status 


pump controllers 
provided data: flow 


water level measurement device 


provided data: quantity of water 


Fig. 1. Physical environment 


3.2 Behaviour of the Steam-Boiler 


The program has to control the level of water in the steam-boiler. This quantity 
should not be too low or too high. Otherwise, the system might be affected. 
The program also has to manage the possible failure of physical units. For that 
purpose, at every instant, it takes into account the global state of the physical 
environment which is denoted by an operation mode. The program follows a 
cycle which takes place each five seconds. A cycle consists of the reception of 
messages coming from the units, the analysis of the received informations, and 
the transmission of messages to the units. According to the operation mode, the 
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program decides at each cycle if the system must stop or not. If not, it activates 
or deactivates pumps in order to keep the level of water in the middle of the 
steam-boiler. 

The specification also gives additional information regarding the physical be- 
haviour of the steam-boiler. Namely, new values, called adjusted and calculated 
values, are proposed. They enable a sustained control of the system, by providing 
a vision of its dynamic, when a measurement device is defective. 

At each cycle, adjusted variables contain either real measurements or ex- 
trapolated values which are calculated during the preceding cycle. An adjusted 
variable contains a real measurement when the corresponding device works pro- 
perly. Otherwise, it contains an extrapolated value. 

Calculated variables provide, at each cycle, extrapolated values of measure- 
ments for the following cycle. They contain the extreme values that are possibly 
reachable from the current adjusted values. 


3.3 Precisions and Decisions about the Original Specification 


Because of the flexibility with which the original specification of the steam- 
boiler can be interpreted, we first need to make some details more precise, on 
the physical behaviour of the steam-boiler, and on the logica] behaviour of its 
implementation in SIGNAL. Different items are concerned by our decisions. Na- 
mely: 


Distinction between pump failures and pump controller failures. We cannot rely 
on the fact that controllers always provide a reliable information about their 
associated pumps. Indeed, according to the specification, failures of controllers 
have to be taken into account and thus, we have to consider them as being 
fallible. Consequently, how could pump failures and pump controller failures be 
distinguished ? 

We first could try to detect the real throughput of each pump with an analysis 
of water-level variations in the boiler. But such a method presupposes a too 
restrictive set of conditions about the physical characteristics of pumps and 
their controllers. Moreover, it actually makes controllers useless. 

We have therefore chosen to determinate what the real state of each pump 
and controller should be, for each possible combination of values. This solution, 
which seems to be the most reasonable and intuitive one, was proposed in [6] (a 
solution of the steam-boiler problem in LUSTRE). 


Message occurrences. In order to have more flexibility for controlling the steam- 
boiler, each pump and each controller is connected to the main program by its 
own communication line. Thus, each pump can be managed simultaneously and 
independently. 

Moreover, some incoming messages from pumps essential are not always rele- 
vant for the system at every instant. For example, a pump should not necessarily 
provide its state if it did not receive a command during the preceding cycle. But 
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it can still provide its state at each cycle as specified in the original text. Only 
the presence of compulsory messages will be checked. 

In addition to these messages, we introduce a new message H. This message 
is a pure signal and stands for the main clock of the program. All involved signals 
in the program have a clock which is a sub-clock of H. This signal is supposed 
to be reliable. It enables to detect the absence of compulsory messages. 


Activation, deactivation of the pumps, and stop of the system. The decisions 
concerning the activation or the deactivation of the pumps, and the decision of 
stopping the system, are made according to the adjusted and calculated values. 
At first, a specific decision is made for each pair of extremum level, adjusted and 
calculated. Then, the program globally decides if the system shall stop or not. 
If not, the program decides how the level shall move (up or down), if necessary, 
and by taking into account each specific decision. 

We calculate the best quantity of water to be provided, rather than just 
opening or closing all the pumps. Thus, at each cycle, the program calculates 
the optimal combination of open and closed pumps, in order to have an optimal 
progression of the level of water toward the middle of the boiler, taking into 
account failures of pumps and controllers. 


3.4 Design and Architecture 


The steam-boiler controller in SIGNAL is composed of four main processes (fig. 2). 


— The IO_MANAGER process detects transmission failures. It implements a filter 
that guarantees the presence of the outgoing data, necessary to the proces- 
sing. This process also provides a signal which announces the manual stop 
of the system. 


— The FAILURE_MANAGER process is in charge of managing the dialogue bet- 
ween the physical units and the program regarding failure detections and 
repair indications. It detects failures and provides a global vision of the 
state of the physical system. 


~ The DYNAMIC process directly implements the equations suggested in the 
specification according to the detected failures and the values provided by 
the measurement devices. 


— The CONTROL process is the main program. Starting from the global vision 
of the state of the system, and from the adjusted measurements provided by 
the preceding process, it manages the operation modes, makes the decision 
to stop the system or not, and finally delivers activation and deactivation 
commands to the pumps. 
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3.5 Motivation for the Choice of This Case Study 


This case study is well adapted to our aim, i.e. to show the interest of the 
SIGNAL-COQ formal approach. Indeed, the program has to handle several phy- 
sical parameters and it may use non linear numerical values (e.g. extrapolated 
values of the level which take into account gradients of increase and decrease of 
the steam throughput, i.e. typically non linear numerical terms), Thus, safety 
properties cannot be simply and directly proved with a standard model checker. 


4 Verification of the Steam-Boiler with Coq 


Proofs of program properties are built on the co-inductive trace semantics of 
SIGNAL which has been implemented with Cog [13]. 

This axiomatization is a set of Coq libraries which gathers the modeling of 
signals, the modeling of the primitives of the kernel language, and a number of 
functions, predicates and theorems about signals. These Coq libraries, as well 
as the proofs of the properties that are stated in the rest of this article, are 
available at [12). 


4.1 Safety Obligations 


A global safety property can be informally stated in the following way: 


CONTROL 
H_}, TO MANAGER 


D 
MODE, 


TRANS_FAILURE 


Fig. 2. SIGNAL processes of the steam-boiler controller 
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When a stop condition is satisfied, the system stops indeed, i.e. the pro- 
gram enters the emergency stop mode. 


This statement implies several sub-properties. Our aim is to emphasize the in- 
terest of using Coq for their verification. Thus, in the sequel of this article, we 
concentrate our work especially on safety sub-properties that cannot be directly 
and simply proved by a standard model checker. 


Since four stop conditions are specified, the global safety property has to be 
proved for each one: 


1. Manual stop. The program received consecutively the required number of 
STOP messages from the user for manually stopping the system. 


2. Critical level. The system is in danger because the water level is either too 
low or too high. 


3. Transmission failure. The program detected a transmission failure. 


4. Initialization. The water level measurement device is defective in the initia- 
lization mode. 


In our SIGNAL specification, these situations are associated with critical messa- 
ges. When one of these signals carries a value, the corresponding condition holds 
and so, the program must stop. 

First of all, the expected relations between these critical messages and the 
operation mode have to be checked. Then, we have to verify that each critical 
message is actually present when the condition to which it corresponds holds. 
For that purpose, the implied sub-properties are divided into two main classes: 


— A first class gathers properties that specify the correct, behaviour of critical 
messages, regarding the critical situations to which they correspond. 

— The second class gathers properties that justify some simplifications or spe- 
cify the use of some internal signals in the processing. 


We now only consider sub-properties coming from Manual Stop and Critical 
level because they involve parameters and non linear numerical values, unlike 
Transmission failure and Initialization. So, they are convincing examples to illu- 
strate our approach. Moreover, Critical level gathers essential properties of our 
solution. 


4.2 Manual Stop 


The problem of manual stop is generalized, using a parameter called nb_stop, 
which stands for the number of STOP messages required for manually stopping 
the program, instead of the fixed value “3” initially suggested in the specification. 

Since we are using a proof assistant, we do not need to instantiate this para- 
meter with a particular value. First, A predicate that denotes the right behaviour 
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of a counter of the successive synchronous instants between two signals is co- 
inductively defined in Coq. Then, we prove that our SIGNAL process provides 
indeed a signal (called CPT) that behaves like a counter of the successive syn- 
chronous instants between the STOP signal (well called ... STOP) and the main 
clock (called H). 


Instead of using a co-inductive predicate that denotes the expected behaviour 
of CPT, we define a co-recursive function that specifies CPT. This function is 
the least fixpoint of the following continuous functional: 


PY: (IN x Fuugay * Fvutsy) > Fonutsy)) 
ON & Furutey * Fwutty > Fanuc) 

(n, Cons(L, X), Cons(L, Y)) + Cons( 

(n, Cons(x, X),Cons(L,Y)) + Cons(0, f 

(n, Cons(L, X), Cons(y, Y)) +> Cons(0, f 

(n,Cons(x, X),Cons(y,Y)) ++ Cons(n + 


fro 


Let cssm, the least fixpoint of F. The Coq definition of cssm is the following: 
CoFixpoint cssm : 
(U,V:Set)nat->(Signal U)->(Signal V)->(Signal nat) := 
(U,V:Set] [n:nat] [X: (Signal U)][Y: (Signal V)]Cases X Y of 


(Cons absent X’) (Cons absent Y’) 

=> (Cons (absent nat) (cssm n X’ Y’)) 
|(Cons (present _) X’) (Cons absent Y’) 

=> (Cons (present 0) (cssm 0 X’ Y’)) 
|(Cons absent X’) (Cons (present _) Y’) 

=> (Cons (present 0) (cssm O X’ Y’)) 


|(Cons (present _) X’) (Cons (present _) Y’) 
=> (Cons (present (S n)) (cssm (S n) XK’ Y’)) 
end. 
Using this function, the predicate that denotes the expected behaviour of CPT 
can now be stated: 


CPT = cssm(0, STOP, H) 


We open in CoQ a section in which hypotheses are stated. Those hypotheses 
correspond to the SIGNAL equations which are concerned by the property to be 
proved: 
q 
| CPT “=H 
| CPT := ((ZCPT+1) when STOP) default (0 when H) 


| ZCPT := CPT$1 init 0 
| MANUAL_STOP := when (CPT=nb_stop) 


1) 
Those equations use constant signals. We first have to define them explicitly. 
Then, we have to state the hypothesis regarding H, the main clock of the pro- 
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gram. In particular, the clock of STOP is a sub-clock of H. 


This yields to the following equations: 
0 | STOP “<H 


1 | CPT “=H 

2 | CstO :=0 

3 | CstO “=H 

4 | CPT := ((ZCPT+1) when STOP) default (CstO when H) 
5 | ZCPT := CPT$1 init 0 

6 | A := (CPT=nb_stop) 

7 | Csttrue := true 

8 | Csttrue “=A 

9 | MANUAL_STOP := Csttrue when A 


Using the co-inductive axiomatization of SIGNAL in Coq [13], this system of 
equations is translated into the following Coq hypotheses: 

Variable nb_stop : nat. 

Variables CPT,ZCPT,CstO : (Signal nat). 

Variables H,STOP,MANUEL_STOP,Csttrue : Clock. 

Variable A : (Signal bool). 


Hypothesis EquationO : (OrderClock STOP H). 
Hypothesis Equationi : (Synchro CPT H). 
Hypothesis Equation2 : (Constant 0 Cst0). 
Hypothesis Equation3 : (Synchro Cst0O H). 
Hypothesis Equation4 : 
CPT = (SignalAA_to_SignalA 
(default (when (fonctioni [n:nat] (plus n (S 0)) ZCPT) 
(Clock_to_Signal_bool STOP)) 
(when Cst0O 
(Clock_to_Signal_bool H))) ). 
Hypothesis Equation5 : ZCPT = (pre O CPT). 
Hypothesis Equation6 : A = (fonctioni [n:nat] (beq_nat n nb_stop) CPT). 
Hypothesis Equation7 : (Constant tt Csttrue). 
Hypothesis Equation8 : (Synchro Csttrue A). 
Hypothesis Equation9 : MANUAL_STOP = (when Csttrue A). 


In this environment, we aim at proving the following lemma: 
Lemma li : CPT = (cssm 0 H STOP). 


This property is too general for a model-checker because of the involved nb_stop 
parameter. It is also too restrictive for an inductive proof because of the instan- 
tiated parameters (values “0”) involved in the cssm predicate and in the SIGNAL 
pre term. A more general property must be stated with non instantiated para- 
meters. Additional hypotheses about these formal parameters can also be stated. 
For that purpose, the fifth SIGNAL equation of the previous specification is pre- 
ferred the following, more general, one: 

Variable ni : nat. 

Hypothesis EquationS : ZCPT = (pre ni CPT). 


Then, the following lemma can be proved: 
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Lemma 11b : CPT = (cssm ni H STOP). 


In particular, the initial property is verified. This is the first part of the property. 
Using the same method, we also prove that MANUAL_STOP provides a value 
when CPT reaches the nb_stop value. Finally, we prove that the program enters 
the emergency stop mode in this case. 

An important feature of the method outlined in this section is that it does not 
at all impact the programming style because of verification constraints. SIGNAL 
processes are naturally translated into Coq objects (without, e.g., any variable 
instantiation). 


4.3. Critical Water Level 


The property concerning the water level can be divided into several sub-properties 
which correspond to the different cases of critica] level. Those properties involve 
parameters like the boiler capacity, the extremal limits of the level, or the no- 
minal capacity of each pump. Moreover, the processing depends on the adjusted 
values. Thus, those properties are parameterized and concern non linear nume- 
rical values. It is therefore not possible to verify them simply and directly with 
a standard model checker. 

At first, a set of preliminary lemmas that justify some simplifications in 
the program have to be proved. For instance, the following statements allow to 
eliminate some cases in the processing: 


Vt élN, qe1(t) < ge2(t) (1) 
Vt € IN, 0 < gayi (t) < gag(t) <C (2) 


where ga,(t) and qag(t) (resp. gci(t) and qce(t)) stand for the minimal and 
maximal adjusted (resp. calculated) values of the level at instant t, and where 
C' stands for the maximal capacity of the boiler. Indeed, the process in charge of 
making a decision about activations of the pumps relies on a list of the different 
possible interleavings of extrapolated and adjusted levels. But some of them are 
omitted because of the statements (1) and (2). So they have to be proved. 

The adjusted values ga;(t) and ga2(t) depend on calculated values qc (t) and 
qcg(t), which are defined as follows: 


Vt € IN*, qei(t) = qar(t — 1) — vag(t — 1)At — sae +pa;(t—1)At (3) 
Vt € IN*, qceo(t) = qag(t — 1) — vay (t —1)At+ seat + pag(t—1)At (4) 


where va;(t) and vag(t) stand for the adjusted values of the outcome of steam, 
and pa,(t) and pao(t) stand for the adjusted values of the cumulated throughput 
of the pumps. The parameters U; and U2 denote the maximum gradients of 
increase and decrease of the outcome of steam. 

In order to prove a property equivalent to the statement (1) with a model 
checker, the processing would have to be changed radically. For instance, the 
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interval of possible values could be divided into several sub-levels and then, new 
boolean properties about the reachability of those levels could be defined. And 
in every case, all parameters like U,, U2 or C should be instantiated. With our 
SIGNAL-COQ approach, we do not consider those verification problems during 
the design of the program. Calculated values are textually stated (cf. (3) and (4)) 
in SIGNAL: 


QCi “= QC2 


| 

| QC1 := QAL - (VA2*Dt) - (0.5*U1*Dt*Dt) + (PA1*Dt) 
| QC2 := QA2 - (VA1*Dt) + (0.5*U2*Dt*Dt) + (PA2*Dt) 
| Vei “= VC2 

| VC1 := VA1-(U2*Dt) 

| VC2 := VA2+(U1*Dt) 


Note that the calculated values concern the following cycle. The definition of 
adjusted values are naturally given from the calculated values of the preceding 
cycle: 


ZQC2 := QC2$1 init C 

ZQCt := QC1$1 init 0.0 

QA2 := (Q when J_OK) default ZQC2 
QA1 := (Q when J_OK) default ZQC1i 
ZVC2 := VC2$1i init 0.0 

ZVC1 := VCi$1 init 0.0 

VA2 := (V when U_OK) default ZVC2 
VAL := (V when U_OK) default ZVCt 


Signals Q and V carry the values coming from the measurement devices. Signals 
J_OK and U_OK provide at each cycle a boolean information about the physi- 
cal state of the measurement devices. We just have to translate these SIGNAL 
equations into COQ hypotheses and we prove the properties (1) and (2) using co- 
induction. Coq offers a natural syntax for manipulating such numerical objects. 
For instance, consider the following statement: 

(VeyEZ)O<2) > (0<y)3(0<2r4+y) 

Using the ZArith library of Coq, the definition of this statement is the following: 
(x,y:Z)(Zle ZERO x)->(Z1t ZERO y) -> (Z1lt ZERO (Zplus x y)) 
Meanwhile, the ZArith library also provides syntactical facilities. Thus, we have 

an equivalent way to define this statement: 

(x,y:Z) ‘0 <= x°->'0 < yf->'0 < xty‘ 
Such a syntax is more intuitive and so, proving equations or inequations on Z 
in CoQ is much easier. 


The following first lemma is very simple to prove: 
Lemma I_LN : (a,b,c,d,e:Z) ‘a <= b‘ -> ‘ec <= d‘ -> 
‘0 < 2*%(b-Dt*c+Dt*e)+U2*(Dt*Dt) - 2*(a-Dt*dt+Dt*e)-U1*(Dt*Dt) ‘. 
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And then it is used to prove the statement (1): 
CoInductive Globally2 [U,V:Set;P: (Stream U)->(Stream V)->Prop]: 
(Stream U)->(Stream V)->Prop := 
globally2 : (X: (Stream U))(Y:(Stream V))(P X Y) 
->(Globally2 P (tl X) (tl Y))->(Globally2 P X Y). 


This Coq statement defines a co-inductive predicate which implements 
the “Ol” connector for temporal logic. Indeed, in our co-inductive seman- 
tics of SIGNAL, we cannot handle explicit temporal indexes (see [13] for 
more details). 


Definition 1tSt := [X: (Signal 2)] [Y: (Signal Z)] 
(x,y:Z) (hd X)=(present x)->(hd Y)=(present y)->(Z1t x y). 


This statement defines the predicate that will be applied to the Globally2 
connector. 


Theorem QAi_1t_QA2 : (Globally2 1tSt QC1 QC2). 


This statement is equivalent to the statement (1) 

The decision concerning the stop of the system because of a critical level is 
founded on the adjusted levels. Using the preceding theorem, it is very simple 
to prove the following property: 


Vt € IN, gai(t) < g(t) < qao(t) (5) 


where q stands for the real level in the boiler. It means that even if a measurement 
device is defective, the program always knows the interval of possible current 
levels. Moreover, the program knows the interval of possibly reachable levels for 
the next cycle. Regarding these intervals, we have to check that the level is never 
likely to reach a critical value. For instance we have: 


Vt € IN, (gai (t) < My Aqei(t) < Mi) => Critical_Level(t) = T (6) 


It means that the program will stop (the critical message Critical_Level carries a 
value T) if the minimal next level is below M, (which is the minimal level under 
which the system is in danger after one cycle) while the current level is possibly 
already below Mj. 

We also prove properties like the following one: 


Vt € IN, (qei(t) < Mi A qeo(t) > M2) => Critical_Level(t) = T (7) 


It means that if the interval of possibly reachable levels for the next cycle is too 
wide for making a safe decision, the program stops. 

These examples emphasize an important advantage of our approach. The 
statements of the expected safety properties are especially clear. Moreover, the 
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programmer does not need to have in mind what kind of property checkable 
or not during the design phase. Thus, specifying, programming and verifying a 
problem are more natural and intuitive operations. 

Unlike a model checker, a proof assistant, and more generally a theorem 
prover cannot provide a counter-example when the check fails. But CoQ gives 
a strong logical framework in which the user acquires a great confidence in 
the conformity of the program to the specification. Moreover, if the program is 
erroneous, the proof progression will stop on an impossible sub-goal which is 
often explicit enough to understand the mistake. 

Nevertheless, theorem proving is often less efficient and often more tedious 
than model checking. Then, even if we could check all properties with only a 
proof assistant like Coq, the optimal solution for verification consists in using 
a model checker as much as possible and in using a theorem prover when a 
property is out of the scope of the model checker. 


5 Related Works 


The steam-boiler problem has become a classical case study for testing and 
comparing formal methods. It has been entirely specified and proved with the B 
tool approach ({1]). In [6], a steam-boiler has been implemented in the synchro- 
nous data-flow language LUSTRE (quite similar to SIGNAL) and verified with its 
model-checker LESAR that allows verification of safety properties. This approach 
enables to prove boolean safety properties but cannot deal with numerical and 
parameterized properties. In [3], the semantics of LUSTRE has been formalized 
in the theorem prover PVS but co-induction is not used to represent infinite sig- 
nals. The solution proposed in the LUSTRE-PVS approach consists of viewing 
signals as infinite sequences. In this setting, a signal is represented by a function 
which associates any instant i (a natural number) with the value v of the sig- 
nal (if it is present) or with (if it is absent). The declarative and equational 
style of SIGNAL is similar to LUSTRE. However, LUSTRE programs always have 
a unique reference of logical time: they are endochronous. SIGNAL specifications 
differ from LUSTRE programs in that they can be ezochronous (i.e. they can have 
many references of logical time). For instance, the process x:=1 | y:=2 does not 
constrain the clocks of x and y to be equal. Hence, had we used functions over 
infinite sequences to represent signals, we would have faced the burden of having 
to manipulate several, possibly unrelated, indexes of time 7. 


6 Conclusion 


The axiomatization of the trace semantics of SIGNAL within the proof assistant 
CoQ offers a novel approach for the validation of reactive systems. 

We demonstrate the benefits of this formal approach for specifying and ve- 
rifying properties of reactive systems by considering a large-scale case study, the 
steam-boiler controller problem. 
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Disregarding any compromise between the modeling tools and the modeled 
system, we augmented the original specification of the steam-boiler of [2] with a 
more precise description of the physical environment. 

This case study shows to be well adapted to the evaluation of the SIGNAL- 
Coq formal approach, allowing the modeling of parameterized strong safety 
property with non-linear numerical constraints. In spite of the strong implication 
for the user during the proof-checking process, it appears that the use of a proof 
assistant like Coq has many advantages. 

In addition to the facts that the approach alleviates any limitation in the 
expression of properties, it makes it possible to acquire a strong confidence in 
the system being specified. Moreover, it is noticeable that experiences at using 
CoQ allowed to develop libraries which improved the efficiency of latter proofs. 

However, this approach is interesting only with properties that cannot be 
directly proved by a model checker. It is thus advisable to use a proof assistant 
in complement to more classical approaches to check these particular (e.g. para- 
meterized, co-inductive, non-linear) properties. In conclusion, the integration of 
model-checking and theorem-proving within a unified framework seems to be a 
promising prospect. 
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Abstract. In this paper we present an approach for modelling functio- 
nal procedures (as they occur in imperative programming languages) in 
a weakest precondition framework. Functional procedures are called in- 
side expressions, but the body of a functional procedure is built using 
standard specification/programming syntax, including nondeterminism, 
sequential composition, conditionals and loops. We integrate our theory 
of functional procedures into the existing mechanisation of the refine- 
ment calculus in the HOL system. To make formal reasoning possible, 
we derive correctness rules for functional procedures and their calls. We 
also show how recursive functional procedures can be handled according 
to our approach. Finally, we provide a nontrivial example of reasoning 
about a recursive procedure for binary search. 


1 Introduction 


A procedure is a parameterised piece of code that can be called from another 
program. In imperative programming two kinds of procedures are encountered, 
which differ mainly in the way in which they are called. A call to an ordinary 
procedure is itself a program statement, while a call to a functional procedure is 
an expression. Thus, calls to functional procedures occur inside other expressions 
(in the right-hand side of an assignment or in the guard of a conditional or a 
loop). Many languages support functional procedures, but still they have been 
ignored in most theories of programming, such as Hoare logic or the refinement 
calculus. These theories typically do not treat expressions at all, assuming that 
the underlying logic handles them sufficiently. 

In this paper we describe how functional procedures can be handled in a 
weakest-precondition framework, where programs are identified with predicate 
transformers. The existing mechanisation of predicate transformers in higher- 
order logic [12,3] provides a foundation for our work, and we have integrated 
our theory of functional procedures into the existing mechanisation in the HOL 
system. Thus we have a framework for reasoning in a mechanised logic about 
imperative programs that contain definitions of and calls to functional proce- 
dures. To make such reasoning possible in practice, we derive rules that reduce 
reasoning about the calling program to correctness reasoning about the body of 
the functional procedure. Special emphasis is put on retaining the original wea- 
kest precondition framework, where the semantics of expressions is compatible 
with the assignment axiom of Hoare Logic. 
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We model functional procedures in their full generality; thus the body of 
a functional procedure can be built using standard specification syntax in the 
style of Dijkstra’s guarded commands [4], including nondeterminism, sequential 
composition, conditionals and loops. Recursive procedures constitute a special 
challenge, but we show how they can be handled, and provide a nontrivial exam- 
ple of reasoning about a recursive procedure for binary search. Our framework 
assumes that functional procedures have no side-effects or reference parameters, 
and mutual recursion is not supported. 


2 Predicate Transformer Semantics 
in Higher-Order Logic 


This section briefly reviews the background to the paper, i.e., the formalisation of 
a weakest-precondition semantics in the higher-order logic of the HOL theorem 
proving system. For more details, we refer to earlier work [12]. 


2.1 Predicate Transformers 


The program state can be modelled as a polymorphic type variable a@ or 3 which 
for concrete programs can be instantiated in different ways. At this point we do 
not assume that states have any specific internal structure, though we introduce 
some assumptions later, in connection with program variables. 

A state predicate is a boolean function on states (i.e., it has type a > bool; 
we identify predicates with sets of states). A predicate transformer is a function 
that maps predicates to predicates. The intended interpretation is that of a 
predicate transformer S:(8 — bool) + (a — bool) as a weakest precondition. 
This means that if q is a predicate (a postcondition) and o is a state then o € Sq 
if and only if execution of a program statement modeled by S from initial state 
o is guaranteed to terminate in some final state o’ in q. 

We do not assume any specific syntax for program statements. What we de- 
velop is a framework that can be used for any programming notation with a 
weakest precondition semantics. In examples, we will use a notation with assig- 
nments (deterministic and nondeterministic), sequential composition, conditio- 
nals, while-loops and blocks. The weakest precondition semantics for this nota- 
tion has been embedded as described in [12]. For example, sequential composition 
and conditional composition are defined as follows: 


Fucp Vel c2q. (cl seq c2) q = cl (c2 q) 
Fue Vg cl ¢2 q. cond g cl c2q = (As. gs > cl qs |c2qs) 


The embedding of the programming notation is shallow, so it can be extended 
whenever we want to add new features, as long as these can be given a weakest 
precondition semantics (see Section 3.5). 

The predicate transformers that model (demonically) nondeterministic pro- 
gram satisfy the following two healthiness conditions: 
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Kdep Strict c = (c false = false) 
fuep conjunctive c = VP. ¢ (glb P) = gib(Ap. 3q. Pq Ap=cq) 


where false is the everywhere false predicate Au.F and glb is the greatest lov er 
bound (intersection) operator on predicates: 


tap VP. glb P = (As. Vp. Pp > ps) 


For a given program (predicate transformer), these healthiness conditions are 
easily proved automatically by a structural argument. Correctness (i.e., tutal 
correctness in the weakest-precondition sense) is formalised as follows: 


‘dep Correct p ¢ q = p implies (c q) 


where implies models the implication (subset) ordering on predicates. Thus, 
correct pc q holds if any execution of ¢ from an initial state in p is guaranteed 
to terminate in a final state in q. 


2.2 Modelling Assignments 


One of the basic statements of any imperative language is the assignment s‘2- 
tement which models a deterministic state change. It can be represented using 
state (update) function e of type a + § where a is the type of initial state and 
@ is the type of final state. The assignment statement is then defined (according 
to its weakest precondition semantics) as: 


Faep Ve q. assign e q = (Av. q (Ce v)) 


Note that the initial and final states can be of different types. This will be cruc.al 
when modelling functional procedures. 

A nondeterministic assignment (specification) statement describes a stace 
change where the result may not be uniquely determined. All we know is that 
the relationship P between initial and final state should be established. The 
definition of nondeterministic assignment is the following: 


baep VP q. nondass P q = (Av. Vv’. Pv v’ > qv’) 


In a concrete program, the state is a tuple (ie., of product type) where 
every component corresponds to a program variable. We can use tupled 4- 
abstraction to make programs readable. In a state space with three variables 
x: num, b: bool, y: num, the assignment x := x + y is described as 


assign A (x,b,y). (x+y,b,y) 
rather than the equivalent but less readable 
assign A\u.(FST u + SND(SND u),FST(SND u),SND(SND u)) 


For details about this way of modelling program variables, we refer to [3]. 
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3 Functional Procedures 


The general purpose of a procedure is to abstract a certain piece of a code, giving 
it a name and then adapting it (through parameters) in different places of the 
program. Since procedures are program fragments, they can be modelled in the 
usual way, i.e., as predicate transformers [5]. 

The effect of calling an ordinary procedure is that the procedure body (ad- 
apted as described by the parameters) is executed in place of the procedure call. 
However, the effect of calling a functional procedure is that a value is retur- 
ned, and calls to functional procedures appear inside expressions in assignments 
and guards. Thus, a call to a functional procedures cannot be replaced by the 
procedure body. 


3.1. The Function Call Operator 


Since a functional procedure is really a program fragment, we want to model it as 
a predicate transformer. Predicates map predicates to predicates, but intuitively 
they stand for program statements which transform (initial) states into (final 
states). Because a state is a tuple, we can interpret a state as a value. Since 
the formalisation furthermore allows the initial and final states to be of different 
types, we can interpret the final state of an execution of a functional procedure 
as the return value, to be substituted for the call. 

To make this work, we have to find a way of extracting the state function 
from the procedure body (the predicate transformer). Operationally we can see 
the body as modelling backward execution - we supply a set of final states 
we are interested in (postcondition), and calculate the biggest possible set of 
initial states (weakest precondition) from which we guaranteed to reach the final 
states described by the postcondition. State functions, however, model forward 
execution — for a given initial state they calculate the final state that is the 
result of the state change. We need to find a translation that reverses execution 
modelled by the functional procedure body. 

For a given initial state we consider all possible sets of reachable final states 
(postconditions). We then calculate the intersection of all such sets of states 
(the minimal set of reachable final states), and finally we select a value from this 
minimal set as the result of our state function. 

This intuition is formalised in the following definition: 


Fuee Ve. fcall ¢ = (Au. ev. glb (Aq. ¢ qu) v) 


where c is the body of the functional procedure, u is the initial state (the argu- 
ments), and v is the result state (the value). 

Note the use of the choice operator < in the definition of fcall. It means that 
the result value of a function call is an arbitrary (but fixed) element from the 
set glb (Aq. cqu). If this set is empty, then we have no information whatsoever 
about the value that is returned. However, conjunctivity and strictness (the two 
healthiness conditions that we generally require) together guarantee that the set 
is nonempty. 
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As an example, we define a very simple functional procedure that squares a 
natural number as follows: 


def sQfun = assign (Au. u*u) 


Note that the assignment in the definition of sqfun plays the role of a return 
statement. To make this association explicit we introduce return as an alter- 
native name for assign, to be used as the last statement to be executed in the 
body of a functional procedure. 

In a Pascal-like syntax this corresponds to something like the following: 


func sqfun(x : num) : num = 


return ©* x 
A call to this functional procedure can then be as follows: 
assign A(x,y). (x,y + fceall sqfun (x+1)) 
corresponding to an assignment of the form 
y:=yt sqfun(x + 1) 


Note that we make no assumption about the syntax of expressions; any HOL- 
expression can be used as the right-and side of an assignment and (because our 
embedding is shallow) if new types and constants are defined, these can also be 
used without changing the theory described here. 


3.2. Basic Properties 


We shall now discuss a number of basic properties of the function call operator 
that we have proved as HOL theorems. We start with a basic soundness property: 
if the body of the functional procedure is an assignment statement then the 
function call extracts the state change function from it: 


fcall_assign = 
+ Ve. fcall (assign e) =e 


The proof rests on the fact that in this case, the intersection 
glb(Aq. (assign e) qu) is the singleton set {eu} where u is the initial (argu- 
ment) state of the functional procedure. Therefore, the choice operator actually 
has no choice but to select eu as the result value of the function call. 

Implicitly the same property holds for any deterministic and terminating 
functional procedure body, since such a statement is semantically equivalent to 
a single assign statement. 

Next, we have a property that allows us show that a functional procedure in 
fact implements a specific function. 


fcall_thm = 
+ We £. conjunctive c A strict c A 
(Vu0. correct (Au. vw = ud) c (Av. v =f u0)) > 
(fcall c = f) 
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Here, c is the body of the functional procedure and f is (the HOL formalisation 
of) the function that the procedure implements. The implementation property 
is reduced to a corresponding correctness property of the procedure body, which 
can then be proved using standard (Hoare logic) methods. 

Finally, we have two theorems that show how the function call operator can 
be propagated past an initial assignment and distributed into a conditional: 


fcall_seq = 
t Ve e. fcall (assign e seq c) s = fcall c (e s) 
fcall_cond = 
F Vg cil c2 s. 
fcall (cond g ci c2) s = (g s ~ fcall cis | fcall c2 s) 


These theorems will be important when proving properties of concrete imple- 
mentations. They could also support a kind of partial evaluation using the actual 
parameters of a function call. 


3.3. Example: Implementation Proofs 


Consider the very simple task of finding the minimum of two numbers. In HOL 
we can formalise the minimum function in the following way: 


Fae Vm n. MINGn,n) = @ <n >|] n) 


In the imperative programming notation we now code a (slightly different) 
functional procedure minfun 


+aef Minfun = 
cond (A (x,y). x < y) 
(return A (x,y). x) 
(return A(x,y). y) 


The two arguments of the functional procedure minfun form the initial state 
(pair). The variables here are explicitly modelled as projection functions FST 
and SND. In a Pascal-style programming notation this would translate to 


func minfun(x : num,y : num) : num = 
if 2 < y then 
return © 
else 
return y 
endif 


Now we can prove that minfun actually implements the HOL function MIN. 


- Vx y. fcall minfun = MIN 
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The proof is straightforward: first we use the theorem fcall_cond to distribute 
fcall into the conditional statement, then we eliminate fcall using the basic 
property fcall_assign. After this follows a case split and then the proof is finis- 
hed off by arithmetic reasoning. Note that this kind of implementation theorem 
is very strong: when reasoning about a program that contains a function call, we 
can replace the function call with the mathematical function that it corresponds 
to. Thus, we never have to refer to the definition of minfun after this. 


3.4 Nontermination 


It is natural to expect functional procedures in our framework to be deterministic 
and terminating, since they typically implement (total) functions. The theorem 
fcall_assign shows that in this case the function call extracts the implemented 
function. 

However, our formalisation of functional procedures is more general than 
this, because it allows function bodies that are nondeterministic and/or nonter- 
minating. An obvious question is now: what do we know about the value that a 
functional procedure returns in these cases? 

Since all expressions in HOL are total, a nonterminating function body does 
not lead to a nonterminating function call, but we get a return value about which 
we know absolutely nothing (apart from type information). This is obviously a 
shortcoming (although it is close to the view generally taken in systems based on 
partial correctness). The weakest precondition semantics identifies aborting and 
nonterminating behaviour, and one could argue that our implementation is com- 
patible with such a view, when aborting behaviour is interpreted as behaviour 
about which we have (and can have) no information. 

A way of avoiding this problem is to interpret an assignment x := e as 
implicitly preceded by an assertion {dom(e)} which aborts if e contains any 
nonterminating (aborting) function calls. One step even further away from what 
we are trying to model would be to treat functional procedures simply as ordinary 
procedures with a result parameter and to interpret the assignment as a block 
with a local variable for each function call and the return values computed and 
stored before the assignment itself is executed. However, then we would definitely 
not be modeling functional procedures in the way that we set out to do. 


3.5 Nondeterminism 


Now consider the case when the function body is (demonically) nondeterministic 
(but terminating). A simple case is when the procedure body consists of a single 
nondeterministic assignment statement nondass R. In this case, if for some given 
initial state (function parameters) u the set Ru is not empty, then some selected 
element from the set Ru is returned by the function call: 


FVRu. (Av. Ruv) => Ru (fceall (nondass R) u) 
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A similar argument can also be used when the body of the nondetermini- 
stic functional procedure is more complex, since it is then equivalent to some 
nondeterministic assignment. 

Thus, (demonic) nondeterminism models underspecification (implementation- 
time nondeterminism rather than run-time nondeterminism). In fact, we could 
also permit angelic nondeterminism in the procedure body; this would corre- 
spond to having an oracle make decisions about the execution [1]. 


4 Correctness Reasoning with Functional Procedures 


The usual way to prove that a program (or some program fragment) is correct 
with respect to a given precondition-postcondition pair is to decompose the glo- 
bal correctness property into correctness properties for the program components, 
using Hoare logic. 


4.1 Correctness Proofs with Functional Procedures 


When proving correctness of a program containing function calls, we use Hoare 
logic to decompose the proof in the ordinary way. When we get to the bottom 
level, we are faced with proving verification conditions that come from guards 
and assignments (e.g., conditions of the form P > Q|x := E] that come from 
the assignment rule of Hoare logic). When function calls are present, such a 
condition expresses a relationship between the calling state, the argument to the 
function, and the result returned by the function call. The following theorem can 
then be used to reduce the condition to a correctness condition on the function 
body: 


fcall_property = 
t Ve Re wo. 
conjunctive c A strict c => 
correct (Au. u = e ud) c (Ares. R u0 res) => 
R uO (fcall c (e u0)) 


Here c is the function body, and R expresses the relationship between the state 
from which the function is called (u0) and the function result (e is the function 
that says how the function argument is constructed from the calling state). Note 
also that fcall_property is a generalisation of fcall_thm (see Section 3.2). 
Since conjunctivity and strictness can be proved automatically, it gives a way of 
reducing a general property of a function call to a correctness property for the 
body of the functional procedure in question. 

Let us use the squaring function to show how this is used in practice. Recall 
that it was defined to satisfy 


F sqfun = return (Au. u * u) 


Suppose that we want to prove the following correctness assertion for the assig- 
nment statement with the function call: 
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+ correct (A(x,y). T) 
(assign A (x,y). (x,fcall sqfun (x + 1) — 1)) 
Czy). y > x) 


Here the tupled abstraction makes clear the correspondence with the intended 
Hoare logic formula 


{T} y := square_fun(x + 1)—-1 {y >} 


After applying the Hoare logic rule for assignment and simplifying, the goal 
is reduced to the following: 


F feall sqfun (x +1) -12>x 


We can now specialise the theorem fcall_property with the following: sqfun 
for c, with (A(x,y)r.r—1 > x) for R, with (A(x,y).x+4+ 1) for e, and with 
x for u0. This theorem reduces our goal (after the conjunctivity and strictness 
conditions have been automatically discharged) to 


- correct (Au. u = x41) (return (Au. u * u)) (Ar. r—1 > x) 


Now we apply the Hoare logic rule (recall that the return statement is an 
assignment) and the goal is reduced to 


F(x +1) * +1) -1>-x 


which is a standard verification condition (and obviously true). 

The functional procedure sqfun was very simple, but the same strategy works 
for more complex procedure bodies (that include, e.g., loops) and more complex 
correctness conditions as well. The example shows how the verification conditions 
that arise from program correctness proofs lead to new correctness proofs, when 
function calls are present. Since the body of one function may contain calls to 
another function, new correctness conditions may appear, and so on. Eventually, 
however, all function calls have been handled and we reach the ground level where 
only basic verification conditions remain (unless there is recursion; see Section 
5). 


4.2 Contextual Correctness Reasoning 


A function call can occur in a situation where a context (i.e., a restriction on 
the possible values of variables) is known to hold. If this contextual knowledge 
can be expressed in the form of a predicate p that holds for the arguments at 
a call to c, then we can assume p as a precondition when reasoning about the 
body of the functional procedure c. 

The theorem that captures this intuition is the following, in the case of an 
implementation proof: 


fcall_thm_pre = 
F Ve p f. 
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conjunctive c A strict c A 
(Wu0. correct (Au. (u = 00) A pu) c (Av. v= f 00)) = 
(Vu. pu => (fcall c u = f u)) 


A simple example of a situation where this property can be useful is when the 
function call occurs inside the guard of a conditional, e.g., 


cond (A (x,y). x>0 A fcall foo x) ... 


In this case, we may instantiate p to (Au.u > 0) when using fcall_thm_pre to 
reason about the call to foo. 

A similar argument can also be used when proving some general property of 
a function call {e.g., when reducing correctness conditions): 


fcall_property_pre = 
+ Ve pRe uo. 
conjunctive c A strict c A 
correct (Au. (u =e u0) A pu) c (Av. RB u0 v)) > 
p (e u0) => R uO (feall c (e u0)) 


This is a direct generalisation of fcall_property (Section 4). 


5 Recursive Functions 


Recursion in the context of (ordinary) procedures can be defined using the least 
fixpoint (with respect to the refinement ordering on predicate transformers) of 
a functional that corresponds to the recursively defined procedure. This method 
cannot be used directly with functional procedures, since the fcall operator is 
not monotonic with respect to the refinement ordering (nor any other suitable 
ordering). 


5.1 A Constructor for Recursion 


As a first step towards defining recursion we define an iterator. 


tae (WE. iter 0 f = assign (As. es’. T)) A 
(Vn f. iter (SUC n) f = f (iter n f)) 


Since there is no bottom (or undefined) element to start the iteration from, 
we choose to start it from some element selected by the choice operator. This 
means that we have to be careful when defining the recursion operator: it is 
not sufficient that two consecutive iterations give the same result (the selection 
operator may cause this to happen “by accident”). However, if from some point 
on all further iterations give the same result, then the iteration has stabilised. 
This justifies the following definition 


F def Vf. fmu f = 
assign (As. ea. dm. Vn. n > m => (feall (iter n f) s = a)) 
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Of course, the problem of nontermination is the same as before: a potentially 
infinite sequence of recursive calls is modelled as terminating and returning a 
result about which we know nothing. 

The following example shows how fmu is used when defining a recursive 
functional procedure (recursion occurs inside a fcall). We define 


Faer Factfun = AZ. 
cond (Ax. x=0) 
(assign Ax. 1) 
(assign Ax. x * fcall Z (x—1)) 
der factfun = fmu Factfun 


This corresponds to a Pascal-style function definition of the following form: 


func factfun(a : num) : num = 


if z = 0 then 
return 1 
else 


return x * factfun(x — 1) 
endif 


5.2 Basic Properties 


The most important property of the recursion operator fmu is that it gives the 
stabilisation point of iter, if such a point exists, for the arguments in question: 


fmu_thm = 
+ Vfsk. 
(vn. fcall (iter (k + n) f)s=gs) => 
(fcall (fmu f) s = g s) 


This property may not seem very informative, but it gives us the tools we need 
to prove properties of functional procedures defined with fmu. The argument k 
is crucial; it corresponds to a termination argument (an upper bound on the 
number of iterations needed to reach stability). 

As an example, we briefly describe how one proves that factfun really im- 
plements the (built-in) FACT function of the HOL system. 

According to fmu_thm it is sufficient to prove the following lemma 


+ Vx n. fcall (iter (SUC x + n) Factfun) x = FACT x 


We have chosen SUC x as termination argument (which is reasonable when we 
are computing the factorial of x). 

The proof of this lemma follows a fairly simple routine, involving only induc- 
tion and rewriting with basic arithmetic facts. As a result, we immediately get 
the implementation theorem 


Ff Vs. feall factfun = FACT 
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5.3 Example: Binary Search 


The factorial example illustrates the fmu operator and it shows that it is pos- 
sible to prove properties of a recursive functional procedure. However, it can 
be argued that factfun merely encodes the standard recursive definition of the 
factorial function into imperative form, and that the proof really only performs 
the corresponding decoding. In order to show that more realistic functional pro- 
cedures can be handled, we now consider an example where the procedure does 
not correspond to an encoding of a standard recursive function definition. 

Our example is a binary search, which in standard syntax is as follows: 


func binfind(f : num + num,/: num,r : num, x : num) : bool = 
if r<l then 
return F 
else 
[ var m := (+1) div 2; 
if f m< az then 
return binfind(f,m+1,r,z) 
else if f m=z then 
return T 
else 
return binfind(f,l,m,x) 
endif endif 
] 


endif 


The aim is to show that if the first argument f is a monotonic function (i.e., 
sorted), then binfind(f,l,r,x) returns T if exists 7 such that 1 < i < r and 
f t= «x, and it returns F otherwise. Here we assume that the values handled 
are natural numbers, but we could obviously handle any total order. However, 
since HOL does not have type classes, a generic search algorithm would make 
the example more complicated, without improving the illustration of the main 
idea: reasoning about functional procedures. 

We define a constant Binfind standing for the functional procedure of which 
the procedure binfind is the least fixpoint: 


+ Binfind = XZ. 
cond (A(f,l,r,x). r < 1) 
(return A (f,1,r,x). F) 
((assign A (f,1,r,x). ((1 + r) DIV 2,f,1,r,x)) seq 
(cond (A (m,f,1,r,x). f m < x) 
(return A(m,f,1,r,x). fcall Z (£,SUC m,r,x)) 
(cond (A (m,f,1,r,x). f m = x) 
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(return A (m,f,1,r,x). T) 
(return \(m,f,1,r,x). fcall Z (f,1,m,x)) 
»)) 


+ binfind = fmu Binfind 
This corresponds exactly to the standard syntax above. The assignment 
assign A(f,1,r,x). ((1 + r) DIV 2,f,1,r,x) 


corresponds to the block entry, adding the new variable (m) as a first state 
component. No explicit block exit is needed; it is taken care of by the return 
statement. 

The correctness of the binary search depends on the first argument being 
sorted. Thus, the theorem that we want to prove is the following: 


+ Vf x. 
Wij.i<jsaofjg<sf ps 
(Vloxr. fcall binfind (f,1,r,x) = 
(Ji. l<iAi<raA @i=-x))) 


Exactly as for the simple example in Section 5.2, the crucial lemma shows 
that iteration of Binfind is guaranteed to terminate with a correct answer. The 
lemma has the following form: 


b+ Vi:numonum. Vx:num. 
Wij.-i<jsrefri<fps 
(Vd k 1. fcall (iter (SUC d + k) Binfind) (f,1,l4d,x) = 
Gi.l<iaAi<l+dA (#i=-»x))) 


The critical part of the proof is an induction over d (the length of the search in- 
terval), which is also the termination argument. Since the termination argument 
is (approximately) halved rather than decreased by one, we must use general 
well-founded induction: 


EVP. (Vn. (Vm. mc n>Pm) > Po) => (Vn. P n) 


rather than standard induction over the natural numbers. The proof strategy 
is essentially the same as in the factorial example, but here we need to push 
fcall both into conditionals and past assignments (see Section 3.2). The proof 
then reduces to basic arithmetic facts (including tedious details about integer 
division) and to simple properties of monotonic functions. The following two are 
typical examples of lemmas used in the proof: 


Mw ib eee Let Aer aH mA ty Se eS es 
kFm>oO-smDIV2 <n 


This proof follows a general strategy that can be used in similar proofs. Ho- 
wever, automating this strategy does not seem feasible, at least not when general 
well-founded induction is used. In this example finding the right instantiations 
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for d, k, and 1 in the lemma required elaborate equation solving. Furthermore, 
assumptions about the state (such as the monotonicity assumption on f) may 
be used in nontrivial ways. 

The proof also depends on pushing fcall into the structure of the functional 
procedure, and this only works going into conditionals and past assignments. 
Thus, the same strategy cannot be used if nondeterministic constructs are in- 
volved, or if there are assignments after a recursive call. It is not clear whether 
there exists a useful proof strategy in these situations. 


6 Conclusions 


We have presented an approach for modelling functional procedures in a weakest 
precondition framework. We are explicitly interested in functional procedures as 
they occur in Pascal-like programs, i.e., where the procedure is described as 
an imperative program even though the call is used as an expression. Thus 
we cannot use existing theories of functional programs (e.g., [2,10]). Instead, 
functional procedures are handled through the link between assignments and 
state transforming functions. To our knowledge, they have not treated in this 
way before. 

Our aim was not to develop a calculus for expressions and functional pro- 
cedures. Instead, we developed a framework that stays within the weakest pre- 
condition tradition with simple (total) expressions, but allows function calls to 
appear inside expressions and function bodies to be written using any suitable 
programming notation with a weakest precondition semantics. The shallow em- 
bedding allows syntax to be flexible, so that any construct that can be given a 
weakest precondition semantics can be added without changing the theory (it 
also means that certain meta-level questions, such as completeness, cannot be 
asked in a meaningful way). 

We integrated this approach into the mechanised version of the refinement 
calculus in HOL system. The HOL formalisation of the refinement calculus con- 
tains support for correctness reasoning about programs, and we reuse it for 
correctness reasoning where calls to functional procedures occur. 

Two ways of proving (correctness) properties of function calls were described. 
If the functional procedure is characterised by an implementation theorem then 
the function call can be replaced directly by a reference to the corresponding 
(mathematical) function. In other cases, the proof leads to correctness proofs for 
the body of the functional procedure. 

Our approach for modelling functional procedures allows function bodies to 
be nondeterministic and/or nonterminating. Since functions in HOL are always 
total, the result of a function call is always a well-defined value. In the nondeter- 
ministic case, the result returned by the function call is an arbitrary but fixed 
value of the form ¢P (where P is some nonempty set). Thus the value returned 
by the function call is deterministic, but the only information we can ever get 
about it is that it belongs to the set P. In this sense our approach differs sig- 
nificantly from expression refinement in a weakest precondition framework (6,7, 
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9] where the aim is to introduce nondeterminism into the expression language 
using choice operators. In our framework, a refinement of the body of a functio- 
nal procedure does not lead to a refinement of the calling program (but proofs of 
properties of the calling program can generally be reused, since they are unlikely 
to make essential use of the selected element ¢P). 

The approach presented here also differs significantly from Laibinis’ work on 
(ordinary) procedures in HOL [5], although both are based on the same forma- 
lisation of weakest precondition semantics. Both are shallow embeddings based 
on the same underlying theory, which means that procedures in one framework 
could call procedures from the other. However, in our theory of functional pro- 
cedure the main effort goes into modelling and reasoning about procedure calls, 
while the main focus of Laibinis’ work is on modelling parameterisation and 
reasoning about refinement of procedure bodies. 

The main drawback of our approach is the simplistic handling of nontermi- 
nating function calls, which (through the use of the choice operator €) return a 
value about which we know nothing. This reflects an inherent feature of the HOL 
logic; an expressions like 5 DIV 0 (division by zero) also returns a fixed value 
about which we have no information. This can be avoided in a deep embedding 
of the expression language (Norrish [8] handles correctness reasoning for a model 
of the C language in this way, starting from an operational semantics), but then 
we could not reuse the HOL expression language and we would lose the sim- 
ple handling of expressions in traditional Hoare logic and weakest precondition 
semantics that we set out to model. 

An obvious continuation to this work would be an investigation of possible 
semi-automated strategies for proofs of implementation and correctness, both for 
simple and recursive functional procedures, and adding functional procedures 
to the HOL-based Refinement Calculator tool [3], in order to provide a more 
user-friendly interface for reasoning about imperative programs with functional 
procedures. 
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Abstract. We present the development of a machine-checked implemen- 
tation of Stalmarck’s algorithm. First, we prove the correctness and the 
completeness of an abstract representation of the algorithm. Then, we 
give an effective implementation of the algorithm that we prove correct. 


1 Introduction 


When formalizing an algorithm inside a prover, every single step has to be ju- 
stified. The result is a presentation of the algorithm where no detail has been 
omitted. Mechanizing the proofs of correctness and completeness is the main 
goal of this formalization. Whenever such proofs are intricate and involve a large 
amount of case exploration, mechanized proofs may be an interesting comple- 
ment to the ones on paper. Also, there is often a gap between an algorithm and 
its actual implementation. Bridging this gap formally and getting a reasonably 
efficient certified implementation is a valuable exercise. 

In this paper we explain how this has been done for Stalmarck’s algorithm (10] 
using the Coq prover [6]. This algorithm is a tautology checker. It is patented and 
has been successfully applied in industry. As it includes a number of heuristics, 
what we formalize is an abstract version of the algorithm. We prove different 
properties of the algorithm including correctness and completeness. We also 
cover two ways of ensuring that the result of an implementation is correct. We 
define execution traces and prove that these traces can be used to check that 
a formula is a tautology in a more elementary way. We also derive a certified 
implementation. 

The paper is structured as follows. The algorithm is presented in Section 2. 
The formalization of the algorithm is described in Section 3. The notion of trace 
is introduced in Section 4. Finally the implementation is given in Section 5. 


2 The Algorithm 


Stalmarck’s algorithm is a tautology checker. It deals with boolean formulae, 
i.e. expressions formed with the two constants T (true), (false), the unary 
symbol — (negation), the binary symbols & (conjunction), # (disjunction), 
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Fig. 1. Annotated tree-like representation of the formula 


(implication), = (equivalence), and a set of variables (v;);en. For example, the 
following formula is a boolean expression containing four variables v;, vg, v3 and 
V4: 


((v1 +> v2) & (v3 > v4)) > ((v1 & v3) +> (v2 & v4)) 


It is also a tautology. This means that the formula is valid (true) for any value 
of its variables. The first step of the algorithm is to reduce the number of binary 
symbols using the following equalities: 


A#B=-(-A&-B) 
Aw B=-(A&-B) 


With this transformation, we obtain an equivalent formula containing only con- 
junctions, equalities and negations. By applying this transformation on our ex- 
ample, we get: 


a((A(v1 & 702) & =(u3 & —v4)) & ((v4 & v3) & (v2 & v4))) 


The algorithm manipulates data structures called triplets. To handle negation, 
variables are signed: +v;. A triplet is a group of three signed variables and a 
connector (either & or =), meaning that the first variable has the value of the 
result of applying the connector to the other two variables. The two kinds of 
triplets are written as vj := tu; & + vz and vu; := tu; = tux. Every reduced 
boolean expression has a corresponding list of triplets. If we consider the tree 
representation of the formula and annotate every binary tree with a fresh new 
variable as in Figure 1, taking every binary node of the annotated tree to form 
a triplet gives us the list of triplets representing the formula. In our example, we 
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get the following list: 


U5 = UL & —v2 (1) 
V6 = U3 & —U4 (2) 
Vz i= ux & —U6 (3) 
Us = U1 & U3 (4) 
vg i= vg & v4 (5) 
V10 = Us & —V9 (6) 
U1 i= v7 & v0 (7) 


The value of the formula is the value of —v,,. The algorithm works by refuta- 
tion. It assumes that the formula is false and tries to reach a contradiction by 
propagation. There is a set of rules for each kind of triplets that defines how to 
do this propagation. For the triplet v; := v; & vz, we have nine rules: 


if vw=—v;, propagate vj=T and y= &4_; 
if vj=—v,, propagate vj; =1 and uy, =T & ik 
if uj= Ug, propagate UV; = Uk Wiz 
if vuj=—vUp, propagate ao & jk 
if u= T, propagate v;=T and y=T &it 
if vj= T, propagate Ui = Uk & 57 
if u;= 1, propagate in i & 51 
if vp= T, propagate Ui =U; eT 
if vp= Ll, propagate wel ey 


For the triplet vj := vj; = vu, we have twelve rules: 


if v= v3, propagate vy= T =i 
if u=—v;, propagate vug= 1 Sassy 
if U= Ug, propagate vj= T =ik 
if uUj=—vUg, propagate vj= 1 Fie 
if vj= Ug, propagate u= T = jk 
if vuj=—vg, propagate uj= L =k 
if w= T, propagate v;= Ur =iT 
if u= L, propagate vj=—vx aah 
if uj= T, propagate v= Ur =jT 
if vj= L, propagate vj=—vr =jL 
if ur= T, propagate u= v; =.T 
if up= Ll, propagate vj=—v; =p1 


In our case, this simple mechanism of propagation is sufficient to establish that 
the formula is a tautology. We start with the state where vj, = T (—v1, = L) 
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Fig. 2. The dilemma rule 


and apply the following rules: 


vui=l, weget vz7=T and vig=T by & 7 on (7) 
vz=T, weget ve=1i and ve= tl by &7 on (3) 
Vvio=T, weget vg=T and vg=l by & tr on (6) 
vg=T, wegel vp =T and vg=T by & 7 on (A) 


vy =T, we get Us = —UQ by &7 on (1) 
vg=T, we get Vg = U4 by &j7 on (5) 
ug =T, we get ve = —U4 by &;7 on (2) 


The last equation is a contradiction since we know that vg = 1 and v4 = vg = L. 
Note that the order in which propagation rules are selected is arbitrary. 

Most of the time the propagation alone is not sufficient to conclude. In that 
case the dilemma rule can be applied. This rule works as depicted in Figure 2. 
Given a state S, it takes two arbitrary variables v; and v; and creates two 
separates branches. In one branch, it adds the equation vj = v; to get Sj. 
In the second branch it adds the equation v; = —v; to get Sp. On each of 
these branches, the propagation is applied to obtain S3 and S4 respectively. 
Then the result of the dilemma rule is the intersection S’ of S3 and S4 that 
contains all the equations that are valid independently of the relation between 
v; and v;. If one of the branches gives a contradiction, the result is the final 
state of the other branch. If we obtain a contradiction on both branches, a 
contradiction is reached. The dilemma rule is iterated on all pairs of variables 
taking the state resulting from the previous application as the initial state of 
the next one, till no new information is gained or a contradiction is reached. 
If no contradiction is reached, the same process is applied using four variables 


creating four branches (v; = vj, Ue = uw), (Ui = Vj, Vk = —U1), (Vi = —Uj, UE = V1) 
and (vu; = —v;,ve = —v,)). If the iteration on four variables is not sufficient to 
conclude, we can proceed using 6 then 8, ..., then 2 * n variables. This gives 


us a generalized schema of the dilemma rule as depicted in Figure 3. The nice 
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Fig. 3. The generalized dilemma rule 


property of this algorithm is that the dilemma rule with four variables v;, v;, 
uz, and v; with the restriction that v; = v; = T is sufficient to find most of the 
tautologies occurring in formal verification. 


3 Formalization of the Algorithm 


3.1 Triplets 


To define triplets in Coq we first need a notion of signed variables. For this, we 
introduce the type rZ with two constructors on nat: + and -. We also take the 
convention that T is represented by +0 and by -0. On rZ, we define the usual 
operations: complement (—), absolute value (||) and an order < such that i < 7 
if and only if |2| < [J]. 

A triplet is a set of three signed variables and a binary operation. We define 
the new type triplet with the only constructor Triplet: 


Inductive triplet : Set := 
Triplet: rBoolOp > rZ > rZ > rZ = triplet 


where rBoolOp is an enumerate type containing the two elements rAnd and 
rEq. In the following we use the usual pretty-printing convention for triplets: 
(Triplet rAnd i j k) and (Triplet rEq i j k) are written asi:= 7 &k and 
i:= Jj =k respectively. 
In order to define an evaluation on triplets, we first need to define an evalua- 
tion on rZ as: 
Definition rZEval: (nat — bool) + rZ > bool := 
Af: nat — bool. rAr: rZ. 
Cases r of 
(+ n) => (f n) 
| (- n) => -(f n) 
end. 
For the triplet we simply check that the first variable is equal to the result of 
applying the boolean operation to the other two variables: 
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Definition tZEval: (nat -> bool) > triplet + bool := 
Af: nat > bool. At: triplet. 
Cases t of 
(Triplet rv) v2 v3) => 
(rZEval v1) = ((rBoolOpFun r) (rZEval v2) (rZEval v3)) 
end. 


where the function rBoolOpFun maps elements of rBoolOp into their correspon- 
ding boolean operation. 

As the algorithm manipulates a list of triplets, we introduce the notion of 
realizability: a valuation realizes a list of triplets if the evaluation of each triplet 
in the list gives true: 


Definition realize Triplets: (nat — bool) > (list triplet) > Prop:= 
Af: (nat — bool). XL: (list triplet). Vt: triplet.tin L > (tZEval f t) =T. 


Another interesting notion is the one of valid equation with respect to a list of 
triplets. An equation 7 = j is valid if for every valuation f that realizes a list of 
triplets L, f gives the same value for 7 and 7: 


Definition validEquation : (list triplet) > rZ > rZ > Prop:= 
AL: (list triplet). Ap, q: rZ.Vf: (nat > bool). 
(realizeTriplets f L) => (f 0) = T = (rZEval f p) = (rZEval f q). 


The condition (f 0) = T is here to keep the convention that (+0) represents T. 
Not every list of triplets corresponds to a boolean expression. To express the 
notion of tautology on triplets we simply ask for the top variable of the generated 
list to be evaluated to true: 
Definition tTautology: Expr > Prop:= 
Ae: Expr. 
Cases (makeTriplets e) of 
(1, 5) ==> (validEquation | s T) 

end. 

where Expr is the type representing boolean expressions and make Triplets is the 


function that computes the list of triplets corresponding to a given expression 
and its top variable. With this definition, we have the following theorem: 


Theorem TautoEquivtTauto: 
Ve: Expr. (tautology ec) <> (tTautology e). 


where the predicate tautology defines the usual notion of tautology on boolean 
expressions. 


3.2 States 


All the operations of checking and adding equations are done with respect to a 
state. We have chosen to represent states as lists of pairs of signed variables. 


Definition State: Set := (list rZ * rZ). 
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The inductive predicate ~ defines when two variables are equal: 


Inductive ~ : State > rZ > rZ > Prop := 
~Ref :Va: rZ.VS: State.a~g a 
| ~In:Va, b: rZ.VS: State.(a,b)inS >arg b 
| ~Sym:Va,b: rZ.VS: State.a~sb>b~g a 
| ~Inv:Va, b: rZ.VS: State.a~s b> —-a~g —b 
| ~Trans:Va, b,c: rZ.VS: State.a~g b>b~gcec>arge 
| ~Contr:Va,b,c: rZ. VS: State.a~s —a>brgc. 


The logic of Coq is constructive. This means that the theorem VP: Prop. PV =P 
is not valid. However, instances of this theorem can be proved. For example we 
have: 

Theorem stateDec: VS: State.Va,b: rZ.a~g bV 7(a-~g bd). 


The property of being a contradictory state is defined as: 
Definition contradictory: State > Prop:= XS: State.da: rZ.a~, —a. 


Note that from the definition of ~, it follows that in a contradictory state all 
equalities are valid. Inclusion and equality on states are defined as: 


Definition C : State + State — Prop:= 
AS, Se: State.Va,b: rZ.a~g, b> arg, 6. 


Definition = : State - State > Prop:= XS, Se: State.S; C Se NSe C S;. 


A valuation realizes a state if all the equations of the state are valid: 


Definition realizeState : (nat — bool) + State + Prop:= 
Af: nat — bool. XS: State. Va, 6: rZ.(a,b) in S => (rZEval f a) = (rZEval f b). 


We also need to define two basic functions on states: union and intersection. The 
union of two states is simply the state given by the concatenation of the two 
lists. In the following we use the notation (a, 6)+S to denote [{a,b)] US. 

The intersection of two states is not the intersection of their lists!. The 
function that computes the intersection of S; and Sp» first generates the list Ly 
of all non-trivial equations of 5}, i.e. all pairs (a,b) such that a~g, banda # b. 
Then, it removes the equations that are not valid in Sj from L,. The resulting 
list represents S, 1M So. 


3.3. One-Step Propagation 


We formalize the one-step propagation as an inductive predicate — whose defi- 
nition is given in Appendix A. Note that S; >; S2 only means that there exists 
a rule that produces Sy from 5; using the triplet t. Because > is defined as a 
predicate, no particular strategy of rule application is assumed. The relation —> 
is compatible with the equality as defined in Section 3.2: 


' For example, given the lists [(1,2);(2,3)] and [(1,3)], their intersection as states is 
[(1, 3)] while their intersection as lists is []. 
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Theorem >3=Er: 
VS, S2, S93: State. Vt: triplet. 
Sy —>t So => S; = S94 > Sy: State. S» >t S; A Sy = So. 


Also a propagation only adds equations: 


Theorem >UEz: 
¥S;,S9: State. Vt: triplet. S; +, S2 > ASs: State. Sp = Sz US;. 


A corollary of this last theorem is that a progatation always produces a bigger 
state: 
Theorem — Incl: 

VS1,S: State. Vt: triplet. S; >, Sg > S; C Se. 


In a similar way we can prove that the relation behaves as a congruence: 


Theorem — Congruent Ex: 
VS; 4 So, So: State. Vt: triplet. 
8; 3: S2> Sy: State. (Ss U S1) 4 Sy AS = (Ss U Sg). 


This gives us as a corollary that the relation is monotone: 
Theorem — Monotone Ex: 
VS1,5S2, 59: State. Vt: triplet. 
S, >, 53 => S, CS > AS;: State. Sx + S; A S3C Sy. 


Another interesting property is that the relation is confluent: 


Theorem — ConflEz: 
Vt,, te: triplet. VS1, Se, Ss: State. S; >t, Ss> Si —> te S3 => 
AS, , Ss: State. S2 +, Sy AN S3 1, Ss A S; = Ss. 


Note that to establish these properties we do not use the particular equations 
that are checked or added. All relations with a shape similar to the one of > 
would have these properties. 


The first semantic property that we have proved is that preserving the rea- 
lizability for the one-step propagation is equivalent in some sense to evaluating 
the triplet to T: 

Theorem realizeStateEvalEquiv: 
Vf: nat > bool. VS), So: State. Vt: triplet. 
(f 0) = T => (realizeState f S;) > S; 1 So 
=> ((realizeState f So) <=> (tEval f t)=T). 


Another key semantic property is that no matter which rule is applied, the re- 
sulting state is the same: 
Theorem > Eg: 

Vt: triplet. VS 1, Se, S9: State. S; >t So > S1 he S3 > So = Se. 


Moreover, a triplet is essentially useful only once: 
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Theorem —} Jnvol: 
Ve: triplet. VS1,S9,53,5,: State. 
S; > Se > Sp C S3 > Ss >: SY => Ss =S,. 


The proofs of the above theorems are not very deep and mostly involve exploring 
the twenty-one possible rules of one-step propagation. 


3.4 Propagation 


The propagation consists in iterating the one-step propagation. We take the 
reflexive transitive closure of —;: 
Inductive —* : State > (list triplet) > State > Prop := 
—* Ref :VS1, Se: State. VL: (list triplet). S; = Se > S; >} Se 
| —*Trans:VS;,S2, 53: State. VL: (list triplet). Vt: triplet. 
tinL => S; 1 Se > Sp 7 S3 > S1 7 So. 


All the properties of the one-step propagation can be lifted to the propagation. 
The exception is the theorem about realizability which has a simple implication 
since the propagation might use a strict subset of the list of triplets: 
Theorem realizeStateEval*: 
Vf: nat — bool. VS, S2: State. VL: (list triplet). 
(f 0) = T = (realizeState f 51) > 51 37 Se 
=> (realizeTriplets f L) => (realizeState f Se). 


Finally the property that a triplet is useful only once is captured by: 


Theorem —>* TermEz: 
VE: (list triplet). VS, S2: State. S; >} Ss > 
(S$; = Se) V (At: triplet. Ss: State.tin LAS; +; S3 A Ss se SPE Se). 


where L — [t} denotes the list obtained by removing t from L. 


3.5 The Dilemma Rule 


As we did for propagation, the dilemma rule is non-deterministic and modeled 
by a predicate. Also, we allow an arbitrary (but finite) number of splits: 


Inductive —? : State > (list triplet) + State > Prop := 
+? Ref :VS;, Se: State. WL: (list triplet). S; 97 Se => S1 4 So 

| -94Split:Va, b: rZ.VS;, 52,53, S,: State. VL: (list triplet). Vt: triplet. 
(a, b)+5, »d Sp => (a, b)+S; ra S3 > SeS3 = 8, => S; a4 Sy. 


The relation +¢ is compatible with the equality: 


Theorem >= : 
VS81,S2,53,5,: State. VL: (list triplet). 
S; > So => Ss = Si} => 8; = Sp > S3 37 S,. 


The same theorems about inclusion also hold: 
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Theorem >*UVEz: 
VS1,Se: State. VL: (list triplet). Sy +4 So => AS3: State. Sg = S, US}. 


Theorem — 4 Incl: 
VS, Se: State. VL: (list triplet). Sy 4 So > Sy Cc So. 


Unfortunately as we only have (S$, M S2) US3 C (S$, U S3) M ($2 U $3) and not 
($1 NS2) US3 = (S1;U $3) (S2US3), the relation is not a congruence. A simple 
way to recapture this congruence would be to define +? as: 
Inductive +74 : State > (list triplet) > State —» Prop := 
4 Ref :VS;, Se: State. VL: (list triplet). S; +7 Se > S; 37 Se 
| +4 Split: Va, b: rZ.VS1, Se, S3, Sy: State. VL: (list triplet). Vt: triplet. 
(a, b)+S, +2 Se => (a, —b)+5; 34 Ss > Sy C Sy C (S283) > S, 9 Sy. 


but this would mean considering the merging of the two branches as a non- 
deterministic operation, so we prefer our initial definition. Even though the re- 
lation is not a congruence, it is monotone: 
Theorem —>? Monotone: 
VL: (list triplet). VS1, So, S93: State. 
Sy +¢ S3 > 81 C Se > ASy: State. Ss ¢ S; A S3 C Sy. 


and it is also confluent: 
Theorem —+? Confluent: 
VE: (list triplet). VS, S2, S39: State. 
S; ad So => S; a S3 => AS;: State. So a? Sy A S83 +¢ Sy. 


The last property we have proved about the dilemma rule is that it preserves 
realizability: 
Theorem realizeStateEval?: 
Vf: nat > bool. VS1, So: State. VL: (list triplet). 
(f 0) =T> (realizeState f S;) > Sy —»¢ So 
=> (realizeTriplets f L) => (realizeState f Se). 


3.6 Stalmarck’s Algorithm 


Stalmarck’s algorithm is the reflexive transitive closure of the dilemma rule: 
Inductive —° : State -> (list triplet) > State + Prop := 
°° Ref :WS;, Se: State. VL: (list triplet). S; = Sp > S; >}, Se 
| —§ Trans :VS1,Se,S9: State. VL: (list triplet). 
S, +4 Se => So ->§ Sy => Sy 5, So. 


As for +4 we get the standard properties: 


Theorem >°=: 
VS, So, S3,5,: State. VL: (list triplet). 
Sy ~7, Se > Sg = Sy > S, = So > S3 4, Sy. 
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Theorem >*UEz: 
VS, Se: State. VL: (list triplet). S; +4, Sp > ASg: State. Sp = S3 US}. 
Theorem ->° Incl: 
VS1,52: State. VL; (list triplet). S; +4 Se > S; C Se. 
Theorem —>* Monotone: 
VL: (list triplet). VS1, Se, S9: State. 
Sy —y S3> 8S, CSo> AS;: State. Sg EL Sy A S3 Cc Sy. 


Theorem —->* Confluent: 
VL: (list triplet).VS;,S2,S3: State. 
Sy aaa So => S; aaa S3 => AS;: State. Sg aaa S, AS» >E Sy. 
Theorem realizeStateEval’: 
Vf: nat + bool. VS, Se: State. VL: (list triplet). 
(f 0) = T = (realizeState f S;) > S; >} Se 
=> (realizeTriplets f L) = (realizeState f Se). 


Only the last property is relevant for the correctness of the algorithm. From the 
theorem realizeStateEval*, the following property is easily derived: 
Theorem stélmarck ValidEquation: 
VE: (list triplet). Va, b: rZ.VS: State. 
[(a, —b)] 4% S => (contradictory S) => (validEquation L a 5). 


Once we have this theorem, we can glue together all the theorems about tauto- 
logy to get the correctness: 
Theorem stalmarckCorrect: 
Ve: Expr.VS: State. 
Cases (makeTriplets e) of 
(1,s) => [(s, L)] >7 S = (contradictory S) > (Tautology e) 
end. 
Another property that has been formalized is the completeness of the algorithm: 
Theorem stalmarckComplete: 
Ve: Expr.(Tautology e) = IS: State. 
Cases (make Triplets e) of 
(I, 8) => [(s, L)] >} SA (contradictory S) 
end. 
This is proved by showing that if e is a tautology, we obtain a contradiction by 
applying the dilemma rule on all the variables in the list of triplets. The program 
extracted from the constructive proof of the theorem staélmarckComplete would 
thus be comparable to the one that computes the truth table. 


4 Trace 


Our relation —* contains all possible execution paths. The choice of the rules, the 
choice of the triplets and the choice of the variables for the dilemma rule are not 
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explicitly given. A natural question arises on what an explicit implementation 
should produce to provide a certificate that it has reached a contradiction. Of 
course, a complete trace of the execution (states, rules, triplets, variables) is a 
valid certificate. In fact, we can do much better and keep only the triplets that 
have been used and the variables on which the dilemma rule has been applied. 
We define our notion of trace as: 
Inductive Trace: Set := 

emptyTrace: Trace 
| tripletTrace: triplet + Trace 
| seqTrace: Trace + Trace + Trace 
| dilemmaTrace:rZ — rZ —» Trace + Trace + Trace. 


The semantics is given by the following predicate that evaluates a trace: 


Inductive evalTrace: State -» Trace > State > Prop := 
emptyEval : VS, Se: State. S; = Sg => (evalTrace S; emptyTrace Sz) 
| tripletEval:VS;,S: State. Vt: triplet. 
S; + Se => (evalTrace S; (tripletTrace t) Sp) 
| segEval :VS;, Se, S3: State. VT, Ta: Trace. 
(evalTrace S; Ty So) = (evalTrace Sp T2 Ss) 
=> (evalTrace S; (seqTrace T; Tz) S3) 
| dilemmakval :VS1,S2, 59,594: State. Va, b: rZ.VT1, Te: Trace. 
(evalTrace (a, b)+S; T; Se) = (evalTrace (a,—b)+S; Te Sz) 
=> Sg S3 = S, => (evalTrace S; (dilemmaTrace a b T; Tz) 5;). 


The fact that a trace determines a unique computation up to state equality is 
asserted by the following theorem: 
Theorem evalTracekq: 
VS1,S2,59,5,: State. VT: Trace. 
(evalTrace S; T Sp.) => (evalTrace Sz T S,) => S; = Ss > Sp = Sy. 


Conversely it is possible to get a trace from any non-deterministic computation: 
Theorem stalmarckExTrace: 
VS, Se: State. VL: (list triplet). 
S; 35, Sp = AT: Trace. (evalTrace S; T Se) A T in L. 


The second condition requires all the triplets in the trace T to be in L. 


5 Implementation 


Because of space limitation, we are only going to sketch the different components 
of the implementation. In particular, we do not make explicit the rules of sign 
using the notation +v to denote either +v or -v. 


5.1 Memory 


We represent non-contradictory states using functional arrays. Appendix B lists 
the different axioms we are using in our development. The size of the array is 


400 P. Letouzey and L. Théry 


maxN , the natural number that exceeds by at least one all the variables in the 
list of triplets. The type of the elements of the array is defined as follows: 


Inductive vM: Set := 
ref:rZ + uM 
class: (list rZ) > uM 


The value of the location i depends on the smallest element a such that +i ~ a. 
If i 4 |a|, the location i contains the value (ref a). Otherwise, it contains the 
value (class L), where L is the ordered list of the elements b such that +i ~ b 
and |b] 4 7. All the constraints about the different values of the array are con- 
centrated in the predicate WellFormed: 


Definition Mem: Set:= {r: (Array maxN vM)| (WellFormed maxN r)}. 


Checking equality Given a memory m, it is easy to build a function min, 
that returns for any element a of rZ the smallest b such that a ~,, b. To check 
the equality between a and 6 in m, it is then sufficient to compare (min,, a) and 
(minm b). 


Adding an equation Given a memory ™, it is also easy to build a function f, 
that returns for any element a of rZ the ordered list of all the elements b such that 
@ ~m, 6. The result of an addition to a memory is a triple (Mem, bool, (list rZ)). 
Since a memory can only represent a non-contradictory state, the boolean is set 
to true if the addition of the equation gives a contradiction, to false otherwise. 
The absolute values of the elements of the list are the locations of the arrays 
that have been modified by the update. To perform the addition of a = b to m, 
we first compare (min, a) and (minm b). If (minm a) = (min, b), the result 
is (m, L, |]). Hf (minn a) = —(minm 6), the result is (m, T, []). If (minm a) < 
(minm 6), the result is (m’, 1, [(minm a)] U (l, b)) where m’ is obtained from 
m by setting the locations corresponding to the elements of (I, b) to (ref + 
(minm a)) and the location |(minm @)| to (class (+(lm, a)U+(lm b))). The case 
where (min, 6) < (minm a) is symmetric to the previous one. 


Intersection The function that computes the intersection takes three memories 
m1, M2, m3 and two lists d,, dy under the hypothesis that m, C mg, my C m3, 
d, is the difference list between m, and mg, and dz is the difference list. between 
my, and m3. It returns a 4-tuple (m{,m,m4,d‘) such that m| = m2 m3, 
mi =m, = m4, and d’ is the difference list between m, and m4. It proceeds by 
successive additions to m, of equations a; = b; where the a; are the elements of 
d, M dz and the 6; are the smallest element such that a; ~m, 5; and aj ~m, }j. 
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5.2 Propagation 


The implementation of the one-step propagation is a composition of checking 
equalities and adding equations. It has the type Mem ~— triplet > (Mem, bool, 
(list rZ)). To do the propagation, we need to define a way to select triplets. As 
the difference lists give the variables whose values have been modified, triplets 
containing these variables are good candidate for applying one-step propagation. 
The type of the propagation function is Mem -> (list rZ) + (Mem, bool, (list rZ)). 
The difference lists resulting from the application of the one-step propagation are 
then recursively used for selecting triplets. This way of propagating terminates 
since we cannot add infinitely many equations to a memory. 


5.3 Dilemma 


We have implemented only the instances of the dilemma rule that are of practical 
use: dilemmai, the dilemma with (a,T) for an arbitrary a, and dilemme2, the 
dilemma with ((a, T), (6, T)) for arbitrary a and 6. To perform the first one, we 
use three memories, one for each branch and one to compute the intersection. 
For the second one we use an extra memory. The first memory m, is used to 
compute each branch iteratively. The intermediate result is stored in the second 
memory m3. At the end of each iteration we compute the intersection of m, and 
mz using the third memory m3. We then switch mz and mg and use the last 
memory to reset m, and m3 before proceeding to the next branch. Note that a 
dilemma with any number of variables could be implemented in the same way 
using four memories. 


5.4 Stalmarck 


At this stage, we have to decide the strategy to pick up variables for the appli- 
cation of the dilemma rules. Our heuristics are very simple and could be largely 
improved. We first add the initial equation and propagate. If no contradiction 
is reached, we iterate the application of the function dilemma! using minimal 
variables starting from +1 to +mazN. We perform this operation till a contra- 
diction is reached or no more information is gained. In the second case, we do 
a double iteration with a,b such that 0 < a < 6 < mazN using the function 
dilemma2. Implementing this naive strategy is straightforward and gives us the 
function doStal for which we have proved the following property: 


Theorem doStalCorrect: 
Ve: Expr. (doStal e) = T = (Tautology e). 


Note that it is the only property that we have proved for our implementation 
and clearly it is not sufficient. An algorithm that always returns | would satisfy 
the above property. While our implementation is not complete since we use di- 
lemma rules only up to four variables, we could prove some liveness property. 
This is feasible and would require to formalize the notion of n-hard formulae as 
described in [10}. 
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Fig. 4. Some benchmarks on a Pentium 450 


5.5 Benchmark 


Once the implementation has been proved correct in Coq, the extraction me- 
chanism [9] enables us to get a functional Ocaml [8] program. The result is 1400 
lines long. To be able to run such a program, we need to provide an implemen- 
tation of the arrays that satisfies the axioms of Appendix B. A first possibility is 
to use balanced trees to implement functional arrays. A second possibility is to 
use tagged arrays, since we have taken a special care in the implementation in 
order to be able to use destructive arrays. The tag in the array prohibits illegal 
accesses. For example, the set function for such arrays looks like: 


let set tar m v = match tar with 
(ar,tag) -> if ((!tag) = true) then 
(tag := false; Array.set ar m v;(ar,ref(true))) 
else raise Prohibited_access;; 


If the program terminates without exception, the result is correct. Table 4 gives 
some execution times on standard examples taken from [5]. For each problem, 
we give which level of dilemma. rules is needed, the number of variables, the num- 
ber of connectives and compare three versions of the algorithm: the algorithm is 
directly hand-coded in Ocaml with slightly different heuristics, our certified ver- 
sion with balanced trees and with tagged arrays. The time includes parsing the 
formula and generating triplets. Even though the performance of an implemen- 
tation largely depends on the heuristics, our certified version seems comparable 
with the hand-coded one and the one presented by John Harrison in [5]. However, 
we are aware that in order to get a realistic implementation, much more work on 
optimizations and heuristics has to be done. Prover 4.0, the propositional prover 
of Prover Technology, takes at most two tenth of a second to conclude on the 
examples given in Table 4. 
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6 Conclusion 


We hope that what we have presented shows that current theorem proving tech- 
nology can be effectively used to reason about algorithms and their implementa- 
tions. We have presented a formalization of Stalmarck’s algorithm that includes 
formal proofs of the main properties of the algorithm. Then, we have proposed 
a format of execution traces and proved that it is adequate. Such certificates 
are important in practice. They represent a simple way of increasing the con- 
fidence in the correctness of specific results. Prover Technology commercializes 
the Prover Plug-In product for integration into CASE and EDA tools. Prover 
Plug-In is based on the Stalmarck’s method, and supports a trace format [7] 
and an associated trace/proof checker. John Harrison [5] also presents a tactic 
based on Stalmarck’s method for HOL [4] using traces: the search is handled by 
a program that generates traces, then, the prover uses these traces to build safe 
derivations of theorems. Finally, the effort for deriving a certified implementa- 
tion is orthogonal to the one on traces since the correctness of results is ensured 
once and for all. 

From the point of view of theorem proving, the most satisfying aspect of this 
work is the formalization of the algorithm. It is a relatively concise development 
of 3200 lines including 80 definitions and 200 theorems. The proof of correctness 
of the implementation is less satisfying. Proving the basic operations (addition 
and intersection) took 2/3 of the effort and represents more than 6000 lines of 
Coq. This does not reflect the effective difficulty of the task. The main reason 
why deriving these basic operations has been so tedious is that most of the proofs 
involves a fair amount of case-splitting. For example, proving properties of the 
addition often requires to take into account the signs and the relative value of the 
components of the equation. We have neither managed to abstract our theorems 
enough nor got enough automation so that we do not have to operate on the 
different cases manually. Moreover, the fact that we handle imperative features 
such as arrays in a functional way is a bit awkward. We plan in a near future to 
use improvements such those presented in [3] to reason directly on imperative 
programs inside Coq. Finally while the overall experience is quite positive, we 
strongly believe that for this kind of formalization to become common practice, 
an important effort has to be done in order to make proofs scripts readable 
by non-specialists. In that respect, recent efforts such as [2,11,12,13] seem very 
promising. 
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A The Predicate for One-Step Propagation 


Inductive — = : State + triplet > State := 


‘Vp, q,r: rZ.VS: State.p~g-q > S Gpiaqer (q, T)t(r, L)+8 

se _ :Vp,g,r: rZ.VS: State.p~g-r => S p:=qur (9, 1)+(7, T)+8 
>e,,: Vp, 9,7: rZ.VS: State.gr~sr = S Sp:-geer (P; ee 

‘Vp, q,7: rZ.VS: State.qrg-r > S Sp:=qur (p, L)+S 

e,7 Vp, 9,7: 1Z.VS: State.prs T > S p:2q8r ee T)+S 

ae, Vp, 9,7: 72. VS: State.grs T => S p:nqeer (p,7r)+8 

x, :Vp,q,7: 1Z.VS: State.q~s L > S Sp:=qur (D, se 

se. :Vp,g,r: 7Z.VS: State.r~s T => S p:2g&r (Pp, g)tS 

~e,, :Vp,9g,7: 72.8: State.r~s 1 > S p:2gur ip, [+8 
=, :Vp,q.7 7Z. V8: State.prgq => S Sp:aq=r (17 

sa _ :Vp,q, 7: 1Z.VS: State.p~g-q => S -rp:=q=r (1, ines 
=, Vp,q,7: 7Z.VS: State.prgr > S p.xg=r (4, 7)+8 

= _ :Vp,g,r: rZ.VS: State.p~g-r => S Sp:2ger (G,L)+8 
a, :Vp, 9,7: 7Z.VS: State.qrgr => S Sp:aq=r (p, T)+S 

= Vp, 9,7: 1Z.VS: State.q~g -r = S pagar (p, L)+8 
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| =,7 Vp, 9,7: 72. VS: State.p~s T => S -rp:=9¢=r (9,7))+S 

| -.2,, :Vp,9,r: rZ.VS: State.p~g L => S 4p.2g=r (9,-7)*8 
| rar VP, 9,7: TZ.VS: State.qrg T => S —p:<g=r (p, 7) +5 

| —>=,,:Vp,q,7r: 7Z.VS: State.q~rsg 1 > S +y:2q=r (p,-1)+9 
| :Vp,q,r: rZ.VS: State.r~g T => S ~>p.<g=r (p, q)+8 

| —-a,, :Vp,¢,7: rZ. VS: State.r~s 1 > S p:<g=r (p,-9)+8. 


B Axioms for Arrays 


Parameter get: Wn: nat.VA: Set.VAr: (Array n A).Vm: nat.VH:m <n. A. 
Parameter set: Wn: nat.VA: Set.VAr: (Array n A). Vm: nat. VH:m <n. 
Vu: A. (Array n A). 
Parameter gen: Vn: nat.VA: Set. Vf: nat + A.(Array n A). 
Axiom setDef,: Wn: nat.VA: Set.VAr: (Array n A). Vm: nat.VH:m <n. 
Vu: A.(get n A (setn A Arm Hv) m H)=v. 
Axiom setDefz: Wn: nat. VA: Set.VAr: (Array n A). Vm, meg: nat. 
VA: my <n. VHo: mz <n.VH:m <n.Vu: A.m, # m2 > 
(get n A (set n A Ar m, Hy v) m2 Hz) = (get n A Ar mg Ho). 
Axiom genDef: Wn: nat.VA: Set.V¥m: nat. Vf: nat 3 A.VH:m <n. 
(get n A (genn A f)m H)=(f m). 
Axiom getIrr: Wn: nat. VA: Set.VAr: (Array n A).\¥m,, me: nat. 
VA: m, <n.VHe: mg <n.my =m. > 
(get n A Ar m, H,) = (get n A Ar mg Ho). 
Axiom setIrr: Vn: nat. VA: Set.VAr: (Array n A).Vm 1, mg: nat. 
VA: my <n.VHo: m2 <n.Vu: A.my = m2 > 
(set n A Ar my Hy v) = (set n A Ar m2 Ho v). 
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Abstract. This paper presents work on technology for transformatio- 
nal proof and program development, as used by window inference calculi 
and transformation systems. The calculi are characterised by a certain 
class of theorems in the underlying logic. Our transformation system 
TAS compiles these rules to concrete deduction support, complete with 
a graphical user interface with command-language-free user interaction 
by gestures like drag&drop and proof-by-pointing, and a development 
management for transformational proofs. It is generic in the sense that 
it is completely independent of the particular window inference or trans- 
formational calculus, and can be instantiated to many different ones; 
three such instantiations are presented in the paper. 


1 Introduction 


Tools supporting formal program development should present proofs and pro- 
gram developments in the form in which they are most easily understood by 
the user, and should not require the user to adapt to the particular form of 
presentation as implemented by the system. Here, a serious clash of cultures 
prevails which hampers the wider usage of formal methods: theorem provers em- 
ploy presentations stemming from their roots in symbolic logic (e.g. Isabelle uses 
natural deduction), whereas engineers are more likely to be used to proofs by 
transformation as in calculus. As a way out of this dilemma, a number of systems 
have been developed to support transformational development. However, many 
of these systems such as CIP [3], KIDS [21] or PROSPECTRA [12] suffered from 
a lack of proof support and proven correctness. On the other hand, a variety of 
calculi have been developed which allow formal proof in a transformational way 
and are proven correct [8,9,10,28,11,2], some even with a graphical user interface 
[14,6]. However, what has been lacking is a systematic, generic and reusable way 
to obtain a user-friendly tool implementing transformational reasoning, with an 
open system architecture capable of coping with the fast changes in technology in 
user interfaces, theorem provers and formal methods. Reusability of components 
is crucial, since we hope that the considerable task of developing appropriate 
GUIs for formal method tools can be shared with other research groups. 

In [15], we have proposed an open architecture to build graphical user in- 
terfaces for theorem provers in a functional language; here, we instantiate this 
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architecture with a generic transformation system which implements transforma- 
tional calculi (geared towards refinement proofs) on top of an LCF-like prover. 
By generic, we mean that the system takes a high-level characterisation of a 
refinement calculus and returns a user-friendly, formally correct transformation 
or window inference system. The system can be used for various object logics 
and formal methods (a property for which Isabelle is particularly well suited as 
a basis). The instantiation of the system is very straightforward once the formal 
method (including the refinement relation) has been encoded. Various aspects 
of this overall task have been addressed before, such as logical engines, window- 
inference packages and prototypical GUIs. In contrast, TAS is an integrated 
solution, bringing existing approaches into one technical framework, and filling 
missing links like a generic pretty-printer producing markups in mathematical 
text. 

This paper is structured as follows: in Sect. 2 we give an introduction to win- 
dow inference, surveying previous work and presenting the basic concepts. We 
explain how the formulation of the basic concepts in terms of ML theorems leads 
to the implementation of TAS. We demonstrate the versatility of our approach 
in Sects. 3, 4 and 5 by showing examples of classical transformational program 
development, for process-oriented refinement proofs and for data-oriented refi- 
nement proofs. Sect. 6 finishes with conclusions and an outlook. 


2 A Generic Scheme of Window Inference 


Window inference [18], structured calculational proof [8,1,2] and transforma- 
tional hierarchical reasoning [11] are closely related formalisations of proof by 
transformation. In this paper, we will use the format of [1], although we will 
refer to it as window inference. 


2.1 An Introduction to Window Inference 
As motivating example, consider the proof for! (AA B => C)>(BAA=C). 


In natural deduction, a proof would look like (in the notation of [27]; we assume 
that the reader is roughly familiar with derivations like this): 


[BA Al} [BA A]! 
ee Be 

a ______= a] 

AAB [AAB=c/ 
a E 
=> I) 
BAASC ne 
(AABSC)S3(BAASC)” ”? (1) 


The following equivalent calculational proof is far more compact. We start 
with B A A => C. In the first step, we open a subwindow on the sub-expression 
BA A, denoted by the markers. We then transform the sub-window and obtain 
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the desired result for the whole expression: 


LBAAi>C (2) 
<= {focus on BA A} 
e BAA 
= {A is commutative} 
AAB 
»-TANABISC 


The proof profits from the fact that we can replace equivalent subexpressions. 
This is formalised by window rules [11]. In this case the rule has the form 


TKA=B 
TF E[A] = E(B] (3) 


where the second-order variable E stands for the unchanged contezt, while the 
subterm A (the focus of the transformation) is replaced by the transformation. 

Comparing this proof with the natural deduction proof, we see that in the lat- 
ter we have to decompose the context by applying one rule per operator, whereas 
the calculational proof employs second-order matching to achieve the same effect 
directly. Although in this format, which goes back to Dijkstra and Scholten (8), 
proofs tend to be shorter and more abstract, there are known counterexamples 
such as proof by contradiction. 

In Grundy’s work [11], window inference proofs are presented in terms of 
natural deduction proofs. By showing every natural deduction proof can be con- 
structed using window inference rules, completeness of window inference for first- 
order logic is shown. This allows the implementation of window inference in a 
theorem prover. A similar technique underlies our implementation: the system 
constructs Isabelle proofs from window inference proofs. 

As was shown in [11,1], window inference proofs are not restricted to first- 
order logic or standard proof refinement, i.e. calculational proofs based on the 
implication and equality. It is natural to admit a family {Ri}icy of reflexive and 
transitive binary relations that enjoy a generalised form of monotonicity (in the 
form of (3) above). 

Extending the framework of window inference in these directions allows to 
profit from its intuitive conciseness not only in high-school mathematics and 
traditional calculus, which deals with manipulating equations, but also in formal 
systems development, where the refinement of specifications is often the central 
notion. However, adequate user interface support is needed if we want to exploit 
this intuitive conciseness; the user interaction to set a focus on a subterm should 
be little more than marking the subterm with the mouse (point&click), otherwise 
the whole beneficial effect would be lost again. 


2.2. The Concepts 


Just as equality is at the heart of algebra, at the heart of window inference there 
is a family of binary preorders (reflexive and transitive relations) {C;};<¢7. These 
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preorders are called the refinement relations. Practically relevant examples of 
refinement relations in formal system development are impliedness S <= P (used 
for algebraic model inclusion, see Sect. 3), process refinement S Crp P (the 
process P is more defined and more deterministic than the process S, see Sect. 4), 
set inclusion (see Sect. 5), or arithmetic orderings for numerical approximations 
{29}. An example for an infinite family of refinement relations in HOL is the 
Scott-definedness ordering for higher-order function spaces (where the indexing 
set I is given by the types): 


f C(a+)x(a+8)+Boot 9 = Va. f c Lex g-+Bool 9 £ (4) 


The refinement relations have to satisfy a number of properties, given as a 
number of theorems. Firstly, we require reflexivity and transitivity for all i € I: 


alja [Refl;] 
al, bAbl capac [Trans;] 


The refinement relations can be ordered. We say CE; is weaker than C; if C, is 
a subset of C;, ie. if a ; 6 implies a C; b: 


a C;b > a4 C; b [Weak, 5] 


The ordering is optional; in a given instantiation, the refinement relations may 
not be related at all. However, because of reflexivity, equality is weaker than any 
other relation, i.e. for all i € J, the following is a derived theorem:! 


a=b>aC;b (5) 


The main device of window inferencing are the window rules shown in the 
previous section: 


(A>aCl;b)> Fal, Fb [Monof",] 


Here, F can either be a meta-variable?, or a constant-head expression, i.e. a term 
of the form Ay, ... Ym-CZ1...£n With c a constant. Note how there are different 
refinement relations in the premise and conclusion of the rule. Using a family of 
rules instead of one monotonicity rule has two advantages: firstly, it allows us 
to handle, on a case by case basis, instantiations where the refinement relations 
are not congruences, and secondly, by allowing an additional assumption A in 
the monotonicity rules, we get more assumptions when refining inside a context. 
These contextual assumptions are crucial, many proofs depend on them.? 


! Tn order to keep our transformation system independent of the object logic being 
used, we do not include any equality per default, as different object logics may have 
different equalities. 

? In Isabelle, meta-variables are variables in the meta-logic, which are subject to uni- 
fication. Users of other theorem provers can think of them just as variables. 

* They already featured in the pioneering CIP-S system [3] in 1984. 
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Dependencies between refinement relations can be more complicated than 
the restricted form of weakening rules [Weak,,;] above may be able to express; 
for example, (4) cannot be expressed by a weakening rule in either direction 
because of the outermost quantor on the right side. For this reason, there is a 
further need for refinement conversions, i.e. tactical procedures that attempt to 
rewrite one refinement proof goal into another. 

To finish off the picture, we consider transformation rules. A transformation 
rule is given by a logical core theorem of the form 


A= (IE; O) (6) 


where A is the application condition, I the input pattern and O the output 
pattern. In other words, transformation rules are theorems the conclusion of 
which is a refinement relation. 


2.3. Parameters 


The parameters for a transformation rule given by core theorem schema (6) are 
meta-variables occuring in the output pattern O but not in the input pattern I. 
After applying the transformation, a parameter occurs as a free meta-variable in 
the proof state. This is not always useful, hence parameters enjoy special support. 
In particular, in transformational program development (see Sect. 3) we have 
rather complex transformations with a lot of parameters and their instantiation 
is an important design decision. As a simple example, consider the theorem 


tS if bthent elset 


which as a transformation rule from the left to the right introduces a case distin- 
ction on 6. This is not very helpful unless we supply a concrete value for b which 
helps us to further develop t in the two different branches of the conditional 
expression under the respective assumption that b holds, or does not. 

TAS supports parameters by when applying a transformation checking whether 
it contains parameters, and if so querying for their instantiation. It further allows 
parameter instantiations to be stored, edited and reused. This avoids having to 
retype instantiations, which can get quite lengthy, and makes TAS suitable for 
transformational program development as well as calculational proof. 


2.4 The Trafos Package 


The Trafos package implements the basic window inferencing operations as 
Isabelle tactics, such as: 


— opening and closing subwindows, 

applying transformations, 

searching for applicable transformations, 
— and starting and concluding developments. 
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In general, our implementation follows Staples’ approach (23], for example in the 
use of the transitivity rules to translate the forward chaining of transformation 
steps into backwards proofs on top of Isabelle’s goal package, or the reflexivity 
rules to close subwindows or conclude developments.The distinctive features of 
our implementation are the subterm and search functionalities, so we concentrate 
on these in the following. 

In order to open a subwindow or apply a transformation at a particular sub- 
term, Trafos implements an abstract datatype path and operations apply_trafo, 
open_sub taking such a path (and a transformation) as arguments. To allow 
direct manipulation by point&click, we extend Isabelle’s powerful syntax and 
pretty-printing machinery by annotations [15]. Annotations are markup sequen- 
ces containing a textual representation of the path, which are attached to the 
terms. They do not print in the user interface, but instead generate a binding 
which invokes the respective operations with the corresponding path as argu- 
ment. In general, users do not need to modify their theories to use the subterm 
selection facilities, they can be used as they are, including user-defined pretty- 
printing.‘ 

The operations apply_trafo and open_sub analyse the context, and for each 
operation making up the context, the most specific [Monof’} rule is selected, and 
a proof step is generated. In order to speed up this selection, the monotonicity 
rules are indexed by their head symbol, so we can discard rules which cannot 
possibly unify; still, the application of the selected rules may fail, so a tactic 
is constructed which tries to apply any combination of possibly fitting rules, 
starting with the most specific. 

Further, for each refinement relation C;, we try to find a rule [Monof’;] where 
F is just a meta-variable and the condition A is void — this rule would state 
that £; is a congruence. If we can find such a rule, we can use it to handle, in 
one step, large parts of the context consisting of operations for which no more 
specific rule can be found. If no such congruence rule can be found, we do not 
construct a step-by-step proof but instead use Isabelle’s efficient rewriter, the 
simplifier, with the appropriate rules to break down larger contexts in one step. 

As an example why the more specific rules are applied first, consider the 
expression EF = «+ (if r=0 thenu+z else v+c). If we want to simplify 
u+ a, then we can do so under the assumption that x = 0, and we have ++ 0 > 
u+ az =u because of the theorem 


(B=>a=y)> (if Bthenrelsez=if Bthenyelsez) [(Mono#] 


But if we had just used the congruence rule for equality 7 = y > fx = fy 
we would have lost the contextual assumption x = 0 in the refinement of the 
if-branch of the conditional. 

When looking for applicable transformations, performance becomes an issue, 
and there is an inherent trade-off between the speed and accuracy of the search. 


* Except if Isabelle’s freely programmable so-called print translations are used (which 
is rarely the case). In this case, there are facilities to aid in programming markup- 
generation analogously to these print-translations. 
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In principle, we have to go through all theorems in Isabelle’s database and check 
whether they can be considered as transformation rule, and if so if the input 
pattern of the rule matches. Many theorems can be excluded straight away since 
their conclusion is not a refinement. For the rest, we can either superficially check 
whether they might fit, which is much faster but bears the risk of returning rules 
which actually do not fit, or we can construct and apply the relevant tactic. We 
let users decide (by setting a search option) whether they want fast or accurate 
search. Another speed-up heuristic is to be able to specify that rules are only 
collected from certain theories (called active theories). Finally, users can exclude 
expanding rules (where the left-hand side is only a variable), because most (but 
not all) of the time these are not really helpful. In this way, users can guide the 
search for applicable transformations by selecting appropriate heuristics. 

When instantiating the functor Trafos, the preprocessing of the monotoni- 
city rules as described above takes place (calculation of the simplifier sets, head 
constants etc.) Further, some consistency checks are carried out (e.g. that there 
are transitivity and reflexivity rules for all refinement relations). 


2.5 Genericity by Functors 


In Standard ML (SML), modules are called structures. Signatures are module 
types, describing the interface, and functors are parameterised modules, map- 
ping structures to structures. Since in LCF provers theorems are elements of an 
abstract SML datatype, we can describe the properties of a window inference 
calculus as described in Sect. 2.2 above using SML’s module language, and im- 
plement TAS a functor, taking a structure containing the necessary theorems, 
and returning a transformation or window inferencing system complete with 
graphical user interface built on top of this: 


functor TAS(TrfThy: TRAFOTHY) = ... 


The signature TRAFOTHY specifies a structure which contains all the theorems of 
Sect. 2.2. Abstracted a little (by omitting some parameters for special tactical 
support), it reads as follows: 


Signature TRAFOTHY = 


sig val topthy : string 
val refl : thm list 
val trans : thm list 
val weak : thm list 
val mono : thm list 


val ref_conv : (string* (int-> tactic)) list 


end 


To instantiate TAS, we need to provide a theory (named topthy) which encodes 
the formal method of our choice and where our refinement lives, theorems descri- 
bing the transitivity, reflexivity and monotonicity of the refinement relation(s), 


TAS — A Generic Window Inference System 413 


and a list of refinement conversions, which consist of a name, and a tactic when 
when applied to a particular subgoal converts the subgoal into another refine- 
ment relation. 

When applying this functor by supplying appropriate arguments, we obtain 
a structure which implements a window inferencing system, complete with a gra- 
phical user interface. The graphical user interface abstracts from the command 
line interface of most LCF provers (where functions and values are referred to by 
names) by implementing a notepad, on which objects (theorems, theories, etc.) 
can be manipulated by dragé&drop. It provides a construction area where the 
current on-going proof is displayed, and which has a focus to open subwindows, 
apply transformations to subterms or search the theorem database for applicable 
transformations. We can navigate the history (going backwards and forwards), 
and display the history concisely, or in detail through an active display, which 
allows us to show and hide subdevelopments. Further, the user interface provides 
an active object management (keeping track of changes to external objects like 
theories), and a session management which allows to save the system state and 
return to it later. All of these features are available for any instance of TAS, and 
require no additional implementation; and this is what we mean by calling TAS 
generic. 

The implementation of TAS consists of two components: a kernel transfor- 
mation system, which is the package Trafos as described in Sect. 2.4, and a 
graphical user interface on top of this. We can write this simplified as 


functor TAS(TrfThy : TRAFOTHY) = GenGUI(Trafos(TrfThy : TRAFOTHY)) 


The graphical user interface is implemented by the functor GenGUI, and is 
independent of Trafos and Isabelle. For a detailed description, we refer to [15], 
but in a nutshell, the graphical user interface is implemented entirely in SML, 
using a typed functional encapsulation of Tcl/Tk called sm1_tk. Most of the 
GUI features mentioned above (such as the notepad, and the history, object and 
session management) are implemented at this more general level. 

The division of the implementation into a kernel system and a generic gra- 
phical user interface has two major advantages: firstly, the GUI is reusable for 
similar applications (for example, we have used it to implement a GUI IsaWin 
to Isabelle itself); and secondly, it allows us to run the transformation system 
without the graphical user interface, e.g. as a scripting engine to check proofs. 


3 Design Transformations in Classical Program 
Transformation 


In the design of algorithms, certain schemata can be identified [7]. When such 
a schema is formalised as a theorem in the form of (6), we call the resul- 
ting transformation rule a design transformation. Examples include divide and 
conquer [20], global search [22] or branch and bound. Recall from Sect. 2.2 
that transformation rules are represented by a logical core theorem with an 
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input pattern and an output pattern. Characteristically, design transformati- 
ons have as input pattern a specification, and as output pattern a program. 
Here, a specification is given by a pre- and a postcondition, i.e. a function 
f : X - Y is specified by an implication Pre(x) —> Post(x, f(xr)), where 
Pre: X — Bool, Post : X x Y ~ Bool. A program is given by a recursive 
scheme, such as well-founded recursion; the proof of the logical core theorem 
must accordingly be based on the corresponding induction principles, i.e. here 
well-founded induction. Thus, a function f : X — Y can be given as 


let fun f(x) = # in f end measure < (7) 


where F is an expression of type Y, possibly containing f, and << C X x X isa 
well-founded relation, the measure, which must decrease with every recursive call 
of f. The notational proximity of (7) to SML is intended: (7) can be considered 
as a functional program. 

As refinement relation, we will use model-inclusion — when refining a speci- 
fication of some function f, the set of possible interpretations for f is reduced. 
The logical equivalent of this kind of refinement is the implication, which leads 
to the following definition: 


C:Boolx Bool> Bool PEQ=Q-—5P 


Based on this definition, we easily prove the theorems ref_trans and ref_refl 
(transitivity and reflexivity of LC). We can also prove that C is monotone for all 
boolean operators, e.g. 


sCtissAultaAu ref_conjl 
Most importantly, we can show that 


(B>sCt)> if BthenselseuCif Bthent else u ref_if 


(-B=uCv)=>if BthenselseuCif Bthenselsev ref_then 


which provides the contextual assumptions mentioned above. When instantiating 
the functor, we also have to specify equality as a refinement relation. Since we can 
reuse the relevant definitions for all theories based on HOL, they have been put 
in a separate functor functor HolEqTrfThy(TrfThy : TRAFOTHY) : TRAFOTHY 
In particular, this functor proves the weakening theorems (5) for all refinement 
relations, and appends them to the list weak. Thus, the full functor instantiation 
reads 


structure HolRefThy = 


struct val name = "HolRef" 
val trans = [ref_trans] 
val refl = [ref_ref1] 
val weak = [] 
val mono = [ref_if, ref_else, ref_conji, ref_conj2, 
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ref_disji, ref_disj2, ...] 
val ref_conv = [] 


end 
structure TAS = TAS(HolEqTrfThy (HolRefThy) ) 


The divide and conquer design transformation [20] implements a program 
f :X ~/Y by splitting X into two parts: the termination part of f, which 
can be directly embedded into the codomain Y of f, and the rest, where the 
values are divided into smaller parts, processed recursively, and reassembled. 
The core theorem for divide and conquer based on model-inclusion refinement 
and well-founded recursion reads:°> 


A — (Pre(x) —> Post(a, f(x)) 
c 
Pre(a) —> f = let fun F(x) = if isPrim(x) then Dir(zx) (8) 
else Com((G, F)(Decom(a))) 
in F end measure <) 


As explained above, the parameters of the transformation are the meta-variables 
appearing in the output pattern but not in the input pattern of the logical core 
theorem (8). Here, these are 


— the termination criterion isPrim:X — Bool; 

~— the embedding of terminal values Dir: X > Y; 

— the decomposition function of input values Decom: X 4 Z x X; 

— a function G: Z — U for those values which are not calculated by recursive 
calls of F; 

— the composition function Com: U x Y + Y that joins the subsolutions given 
by G and recursive calls of F; 

-— and the measure < assuring termination. 


We will now apply this transformation to synthesise a sorting algorithm in 
the theory of lists. We start with the usual specification of sort, as shown on 
the left of Fig. 1. We can see the notepad, on which the transformation object 
Divide & Conquer is represented by an icon. The workspace shows the current 
state of the already started development. The highlighting indicates the focus 
set by the user. Now we drag the transformation onto the focus; TAS interprets 
this gesture as application of the transformation at the focus. In this case, TAS 
infers that there are parameters to be provided by the user, who is thus guided 
to the necessary design decisions. The parameter instantiations are fairly simple: 
the termination condition is the empty list, which is sorted (hence Dir is the 
identity). The decomposition function splits off the head and the tail; the tail is 
sorted recursively, and the head is inserted into the sorted list (hence, G is the 
identity). Finally, the measure relates non-empty lists to their tails (since the 


7 (f,g) is the pairing of functions defined as (f, 9)(z, y) “wf (f(x), F(y)). 
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Fig. 1. TAS and its graphical user interface. To the left, the initial stage of the develop- 
ment, and the parameters supplied for the transformation; to the right, the development 
after applying the divide and conquer transformation. On the top of the window, we 
can see the notepad with the theory SortDC, the transformation Divide&Conquer, the 
specification sort_spec, the ongoing development (shaded) and the parameter instan- 
tiation divconq_inst. 


recursive call always passes the tail of the argument; a relation easily proven to 
be well-founded). 

This transformation step readily produces the desired program (right of 
Fig. 1). However, this step is only valid if the application conditions of the 
transformation hold. When applying a transformation, these conditions turn 
into proof obligations underlying a special bookkeeping. The proof obligations 
can be proven with a number of proof procedures. Typically, these include au- 
tomatic proof via Isabelle’s simplifier or classical reasoner and interactive proof 
via IsaWin. Depending on the particular logic, further proof procedures may 
be at our disposal, such as specialised tactics or model-checkers integrated into 
Isabelle. 

Another well-known scheme in algorithm design is global search which has 
been investigated formally in [22]. It represents another powerful design trans- 
formation which has already been formalised in an earlier version of TAS [13]. 


4 Process Modelling with CSP 


This section shows how to instantiate TAS for refinement with CSP [19], and 
will briefly present an example how the resulting system can be used. CSP is a 
language designed to describe systems of interacting components. It is supported 
by an underlying theory for reasoning about their equivalences, and in particular 
their refinements. In this section, we use the embedding HOL-CSP [26] of CSP 
into Isabelle/HOL. Even though shortage of space precludes us the set out the 
basics of CSP here, a detailed understanding of CSP is not required in the 
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following; suffice it to say that CSP is a language to model distributed programs 
as communicating processes. 


{f= Construction History : | 
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Fig. 2. TAS in the CSP instance. On the right, the construction history is shown. The 
development proceeded by subdevelopments on COPY1 and COPY2, which can be shown 
and hidden by clicking on [Subdevelopment] . Similarly, proof obligations can be shown 
and hidden. In the lower part of the main window, the focus is set on a subterm, and all 
applicable transformations are shown. By clicking on the name of the transformations, 
their structure can be displayed (not shown). 


CSP is interesting in this context because it has three refinement relations, 
namely trace refinement, failures refinement and failures-divergence refinement. 
Here, we only use the third, since it is the one most commonly used when deve- 
loping systems from specifications, but e.g. trace refinement can be relevant to 
show security properties. 

Recall from Sect. 2.5 that to instantiate TAS we need a theory encoding our 
formal method, and theorems describing the refinement relation. The relevant 
theory is called CspTrafos, which contains the core theorems of some (simple) 
transformations built on top of Csp, the encoding of CSP into Isabelle/HOL. 

For brevity, we only describe instantiation with failure-divergence refinement; 
the other two refinements would be similar. The theorems stating transitivity 
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and reflexivity of failure-divergence refinement are called ref_ord_trans and 
ref_ord_refl, respectively. For monotonicity, we have a family of theorems 
describing monotonicity of the operators of CSP over this relation, but since the 
relation is monotone only with respect to the CSP relations it is not a proper 
congruence. This gives us the following functor instantiation: 


structure CspRefThy = struct 
val name = "CspTrafos" 


val trans = [ref_ord_trans] 

val refl = [ref_ord_ref1] 

val mono = [mono_mprefix_ref ,mono_prefix_ref ,mono_ndet_ref, 
mono_det_ref ,mono_Ren_ref ,mono_hide_set_ref, 
mono_PalI_ref ,mono_Inter_ref] 

val weak = [] 


val ref_conv = [] 
end 
structure TAS = TAS(HolEqTrfThy(CspRefThy) ) 

Fig. 2 shows the resulting, instantiated system in use. We can see an ongoing 
development on the left, and the opened construction history showing the deve- 
lopment up to this point on the left. As we see, the development started with two 
processes in parallel; we focussed on both of these in turn to develop them sepa- 
rately, and afterwards rearranged the resulting process, using algebraic laws of 
CSP such as sync_interl_dist which states the distributivity of synchronisation 
over interleaving under some conditions. The development does not use powerful 
design transformations as in Sect. 3, but just employs a couple of the algebraic 
laws of CSP, showing how we can effectively use previously proven theorems for 
transformational development. Finding design transformations like divide and 
conquer for CSP is still an open research problem. 

If we restrict ourselves to finite state processes (by requiring that the channels 
only carry finite messages), then we can even check the development above with 
the CSP model checker FDR. [19], connected to Isabelle as a so-called oracle (a 
trusted external prover). This speeds up development at the cost of generality 
and can e.g. be used for rapid prototyping. 


5 Data Refinement in the Refinement Calculus 


In this section, we will emphasise a particular aspect of the genericity of TAS and 
demonstrate its potential for reuse of given logical embeddings. As we mentioned, 
TAS is generic with respect to the underlying refinement calculus, which in par- 
ticular means that it is generic with respect to the underlying object logic. In the 
previous examples, we used higher-order logic (as encoded in Isabelle/HOL); in 
this example, we will use Zermelo-Frankel set theory (as encoded in Isabelle/ZF). 
On top of Isabelle/ZF, Mark Staples has built a substantial theory for impera- 
tive program refinement and data refinement [24,25] following the lines of Back’s 
Refinement Calculus RC [2]. 
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RC is based on a weakest precondition semantics, where predicates and pre- 
dicate transformers are represented as sets of states and functions taking sets of 
states to sets of states respectively. The distinctive feature of Staples’ work over 
previous implementations of refinement calculi is the use of sets in the sense of 
ZF based on an open type universe. This allows derivations where the types of 
program variables are unknown at the beginning, and become more and more 
concrete after a sequence of development steps. 

In order to give an idea of Staples’ formalisation, we very briefly review some 
of the definitions of Back’s core language in his presentation:® 


Skip, = Aq: P(A).q 
a; b= dq: dom(b).a ‘b‘ q 


def 


if g thena else bfi= Aq: dom(a) Udom(b). 
(gNa ‘ q)U ((U(dom(a) U dom(b)) — g) 18 * g) 
while g do cod © dg: P(A). lfpa N(gne* N)U((A—g)Nq) 


This theory could be used for an instantiation of TAS, called TAS/RC. The 
instantiation follows essentially the lines discussed in the previous sections; with 
respect to the syntactic presentation, the configuration for the pretty-printing 
engine had to provide special support for 5 print-translations comprising 100 
lines of code, and a particular set-up for the tactics providing reasoning over 
weill-typedness, regularity and monotonicity. (We omit the details here for space 
reasons). As a result, a larger case study in [24] for the development of an BDD- 
related algorithm as a data-refinement from truth tables to decision trees can be 
represented inside TAS. 


6 Conclusions and Outlook 


This paper has presented the transformation system TAS. TAS is generic in 
the sense that it takes a set of theorems, describing a refinement relation, and 
turns them into a window inference or transformation system, complete with an 
easy-to-use, graphical user interface. This genericity means that the system can 
be instantiated both to a transformation system for transformational program 
development in the vein of traditional transformation systems such as CIP, KIDS 
or PROSPECTRA, or as system for window inference. We have demonstrated 
this versatility by showing instantiations from the provenance of each the two 
areas just mentioned, complemented with an instantiation from a different area, 
namely reasoning about processes using CSP. 

The effort required for the actual instantiation of TAS is very small indeed, 
since merely the values for the parameters of the functor need to be provided. 


c 


® Note that the backquote operator ‘ is infix function application in Isabelle/ZF. 
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(Only rarely will tactical programming be needed, such as mentioned in Sect. 5, 
and even then it only amounts to a few lines of code.) It takes far more effort 
to set up the logical encoding of the formal method, in particular if one does so 
conservatively. 

TAS’ graphical user interface complements the intuitiveness of transforma- 
tional calculi with a command-language-free user interface based on gestures 
such as drag&drop and proof-by-pointing. It further provides technical infra- 
structure such as development management (replay, reuse, history navigation), 
object management and session management. 

TAS is implemented on top of the prover Isabelle, such that the consistency of 
the underlying logics and its rules can be ensured by the LCF-style architecture 
of Isabelle and well-known embedding techniques. It benefits further from the 
LCF architecture, because we can use SML’s structuring mechanisms (such as 
functors) to implement reusable, generic proof components across a wide variety 
of logics. 

Internally, we spent much effort to organise TAS componentwise, easing the 
reuse of as much code as possible for completely different logical environments. 
The GUI and large parts of TAS (except the package Trafos) are designed to 
work with a different SML-based prover, and are readily available for other rese- 
arch groups to provide GUI support for similar applications. On the other hand, 
the logical embeddings (such as HOL-CSP) which form the basis of the transfor- 
mation calculi do not depend on TAS either. This allowed the easy integration 
of Staples’ encoding of the refinement calculus into our system, as presented in 
Sect. 5. 


6.1 Discussion and Related Work 


This work attempts to synthesise previous work on transformational program 
development [3,21,12] which developed a huge body of formalised developments 
and design schemes, but suffered from ad-hoc, inflexible calculi, correctness pro- 
blems and lack of proof support, with the work on window inferencing [18,11] 
and structured calculational proof [2,1], which provides proven correctness by 
LCF design and proof support from HOL or Isabelle. 

PRT [6] is a program refinement tool (using window inference) which is built 
on top of the Ergo theorem prover. It offers an interface based on Emacs, which 
allows development management and search functionalities. However, the Tk- 
WinHOL system [14] comes closest to our own system conception: it is based 
on Tcl/Tk (making it platform independent), and offers focusing with a mouse, 
dragé&drop in transformational goals, and a formally proven sound calculus im- 
plemented by derived rules in HOL. On the technical side it uses Tcl directly 
instead of an encapsulation (which in our estimate will make it much harder 
to maintain). On the logical side, it is also generic in the sense that it can be 
used with different refinement relations, but requires more work to be adapted 
to a new refinement relation; for example, users need to provide a pretty-printer 
which generates the correct mark-up code to be able to click on subterms. In 
contrast, TAS extends Isabelle’s infrastructure (like the pretty-printer) into the 
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graphical user interface, leaving the user with less work when instantiating the 
system. 

The essential difference between window inferencing and structured calcula- 
tional proof {1] is that the latter can live with more than one transformational 
goal. This difference is not that crucial for TAS since it can represent more 
than one transformational development on the notepad and is customisable for 
appropriate interaction between them via drag&drop operations. 

Another possible generalisation would be to drop the requirement that all 
refinement relations be reflexive. However, this would complicate the tactical 
programming considerably without offering us perceivable benefit at the mo- 
ment, so we have decided against it. 


6.2 Future Work 


Future work can be found in several directions. Firstly, the user interaction can 
still be improved in a variety of ways. Although in the present system, the user 
can ask for transformations which are applicable, this can considerably be im- 
proved by a best-fit strategy and, for example, stronger matching algorithms 
like AC-matching. The problem here is to help the user to find the few in- 
teresting transformations in the multitude of uninteresting (trivial, misleading) 
ones. Supporting design decisions at the highest possible user-oriented level must 
still count as an open problem, in particular in a generic setting. 

Secondly, the interface to the outside world can be improved. Ideally, the 
system should interface to a variety of externally available proof formats, and 
export web-browsable proof scripts. 

A rather more ambitious research goal is the reuse and abstraction of trans- 
formational developments. A first step in this direction would be to allow to 
cut&paste manipulation of the history of a proof. 

Thirdly, going beyond classical hierarchical transformational proofs the con- 
cept of indexed window inferencing [29] appears highly interesting. The overall 
idea is to add an additional parameter to the refinement relation that allows 
to calculate the concrete refinement relation on the fly during transformational 
deduction. Besides the obvious advantage of relaxing the requirements to refine- 
ment relations to irreflexive ones (already pointed out in [23]), indexed window 
inferencing can also be used for a very natural representation of operational se- 
mantics rules. Thus, the system could immediately be used as an animator for, 
say, CSP, given the operational semantics rules for this language. 

Finally, we would like to see more instances for TAS. Transformational deve- 
lopment and proof in the specification languages Z and CASL should not be too 
hard, since for both embeddings into Isabelle are available [13,16]. The main step 
here is to formalise appropriate notions of refinement. A rather simple different 
instantiation is obtained by turning the refinement relation around. This amo- 
unts to abstracting a concrete program to a specification describing aspects of 
its behaviour, which can then be validated by a model-checker. For example, de- 
adlock checks using CSP and FDR have been carried out in this manner, where 
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the abstraction has been done manually[4,5,17]. Thus we believe that TAS re- 
presents an important step towards our ultimate goal of a transformation system 
which is similarly flexible with respect to underlying specification languages and 
refinement calculi as Isabelle is for conventional logical calculi. 
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Abstract. Thomas has presented a novel proof of the closure of w- 
regular languages under complementation, using weak alternating auto- 
mata. This note describes a formalization of this proof in the theorem 
prover Isabelle/HOL. As an application we have developed a certified 
translation procedure for PTL formulas to weak alternating automata 
inside the theorem prover. 


1 Introduction 


The close relationship between w-automata and temporal logic [10,8] is one of 
the cornerstones of the theory that underlies model checking. Traditionally, w- 
regular languages have been defined via Biichi automata [1,7,8], a straightforward 
extension of standard finite automata, but operating on infinite words. One of the 
fundamental results about w-regular languages establishes their closure under 
complementation. It was first proven non-constructively by Biichi [1], relying 
on Ramsey’s theorem. Over a period of 25 years, the result has been reproved 
several times, using variants of Biichi automata to obtain effective constructions, 
culminating in a paper by Safra [5] that gives an essentially optimal, exponential 
construction. 

An alternative definition of w-regular languages due to Muller, Saoudi, and 
Schupp [3] is based on (weak) alternating automata, for which several states 
can be simultaneously active during a run over a given word. In this framework, 
complementation can simply be achieved by dualizing the transition relation of 
the original automaton, avoiding the exponential blowup of Biichi automata. 
Although the construction is simple, its correctness proof is far from obvious. 
Thomas [9] has recently given a beautiful presentation of this proof in terms 
of winning strategies for the class of games associated with weak alternating 
automata, isolating the complexity of the proof in three independent subpro- 
blems. We have formalized Thomas’ proof in the interactive theorem prover 
Isabelle/HOL and believe that this formalization constitutes an interesting case 
study in formalizing mathematics because it involves fairly advanced mathemati- 
cal concepts such as logical games and strategies, offering a mix of combinatorial 
reasoning and of linear and modulus arithmetic that exercise the power of auto- 
mated proof strategies. 

As an application, we have then verified a simple, linear, and compositional 
construction [11,2] that associates a weak alternating automaton A, with a given 
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formula y of propositional temporal logic. Because this construction is defined 
as a set of recursive functions, we may use Isabelle’s rewriting machinery to 
actually evaluate these functions and in this way obtain a certified translation 
of LTL formulas to automata within the theorem prover. 


2 Isabelle/HOL 


Isabelle is a generic interactive theorem prover which can be instantiated with 
different object logics. One popular and well-developed instance has been deve- 
loped for higher-order logics, based on Church’s version of Higher Order Logic. 
From now on, Isabelle means Isabelle/HOL. Extensive documentation can be 
found on the Web at http://isabelle.in.tum.de; we only highlight some of 
the syntax required for understanding the remainder of the paper. 

We use types constructed by function application (=), products (*) or re- 
cords {i.e., tuples with named components). Isabelle also supports inductive 
definitions of data types. 

The syntax for formulas is standard. Isabelle distinguishes between object- 
level (=>) and meta-level (=>) implication, and similarly for universal quan- 
tification, but that distinction is unimportant for our purposes. The notation 
[Aq;...;An] => A is short-hand for the nested meta-level implication A; => 
... An => A. 

Definitions of types and operators are collected in theories. Non-recursive 
operators are defined via the meta-level equality (=), recursive operators can be 
introduced by primrec and recdef constructs. 

We do not present any proofs, because Isabelle proof scripts are not intel- 
ligible for human readers, but we usually indicate their complexity in terms of 
how many interactions were necessary. Besides low-level proof commands such 
as resolution and instantiation, Isabelle provides higher-level search procedu- 
res (tactics) based on rewriting and the classical reasoner, which implements a 
tableau-based prover for predicate logic and sets. These automatic tactics must 
be supplied with information about which definitions to expand and which Jem- 
mas to use, which requires some expertise and experimentation. 


3 Automata, Games, and Strategies 


We describe the concepts of the theory of weak alternating automata and their 
associated games that are used in Thomas’ proof as well as their formalization 
in Isabelle. The Isabelle definitions have been copied verbatim from the input 
files, except for some pretty-printing to improve readability. 


3.1 Positive Boolean Formulas 


The transition relation of an alternating automaton is conveniently defined via 
positive Boolean combinations of its states, represented by the following induc- 
tive data type parameterized by type a. 
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datatype o pboolean = 


Atomic o 
| And (o pboolean) (a pboolean) 
i Or (o pboolean) (a pboolean) 


Straightforward recursive functions compute the set of atoms that occur in a 
positive Boolean formula, and its models. 


atoms :: 0 pboolean > o set 

atoms (Atomic s) = {s} 

atoms (And p q) (atoms p) U (atoms q) 
atoms (Or f g) (atoms f) U (atoms g) 


models :: o@ pboolean > o set set 
models (Atomic s) = {M|s eM } 
models (And p q) (models p) M (models q) 
models (Or p q) (models p) U (models q) 


We will mainly be interested in “small” models that are subsets of the atoms 
contained in a formula. 


smodels :: o pboolean = o set set 
smodels p = (models p) M P(atoms p) 


The dual formula p of a positive Boolean formula p is obtained by exchanging 
conjunctions and disjunctions. 


dual_form (Atomic s) = Atomic s 
dual_form (And p q) = Or (dual_form p) (dual_form q) 
dual_form (Or pq) = And (dual_form p) (dual_form q) 


We can prove a number of preliminary lemmas about positive Boolean formulas 
and their models. For example, every formula has a model, and small models are 
finite sets. A set M is a model of # iff it contains an element of every model of p 


M € models (dual_form p) = VR € models p. Js. sEMNR 


and a similar relation holds for the small models of p and p (Thomas considers 
minimal models, his “Remark 3” asserts a similar relationship). All of these 
lemmas are proved by induction, followed by invocations of Isabelle’s automated 
tactics. 


3.2 Automata and Runs 


A weak alternating automaton A = (S,s9,6,p) over alphabet B is given by a 
set S of states, an initial state so € S, a transition function 6 that associates a 
positive Boolean formula 6(s,b) with every state s € S and input symbol } € B, 
and a ranking function p : S — N such that p(t) < p(s) whenever ¢ occurs in 
some formula 6(s,6) (this restriction is what makes A “weak”). We represent 
automata via a record type (og, 8) waa and a well-formedness predicate is_waa. 
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record (0,3) waa = 


states 23: 0 set 
initial EO 
trans :: [o, 8] > o pboolean 
rank 1: 0 => nat 
is_waa :: (0,8) waa => bool 


is_waa auto = initial auto € states auto 
A Vs € states auto. Vb. 
atoms(trans auto s b) C states auto 
A Vt € atoms(trans auto s b). 
rank auto t < rank auto s 


Runs of alternating automata over w-words are often represented as infinite 
trees of states where the successors of a state s and input symbol } are given by 
some model of 6(s, b). Thomas suggests instead to represent runs as infinite dags, 
which can be formalized as a record that contains the root state and a function 
that returns the set of successor states for state s at a given depth in the dag. 
We also define two type synonyms that will be used to represent w-words and 
other infinite sequences. 


record o dag = 


root Miran 6.8 

succs :: [mat, o] > o set 
types 

B word = nat > 6 

o0 seq = nat >o 


The following definitions introduce the set of run dags of an automaton A 
over a given word w and the set of paths through an infinite dag: 


run_dags :: [(0,8) waa, 6 word] > o dag set 
run_dags auto w = 

{dg | root dg = initial auto 

A Vis. succs dg is € smodels (trans auto s (w i))} 
paths :: o dag => (0 seq) set 
paths dg = {pi | pi 0 = root dg 
A Wi. pi (it1) € succs dg i (pi i)} 

It remains to define the acceptance condition. Because ranks are decreasing 
along any path in a run dag, they must eventually stabilize. Now, the run of 
a weak alternating automaton is accepting iff for every path through the dag, 


this “limit rank” is even. Equivalently, the least rank assumed along any path 
through the dag must be even.! 


ranks :: [o seq, og => nat] => nat set 


‘ o denotes function composition; the range of a function f is denoted by range f. 
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ranks pi f = range (f o pi) 


least_rank :: [o seq, o => nat] => nat 
least_rank pi f = LEAST i. i € ranks pi f 


acc_path :: [o seq, (0,8) waa] = bool 
acc_path pi auto = (least_rank pi auto) mod 2 = 0 


is_accepting :: [(o,() waa, o dag] = bool 
is_accepting auto dg = Vpi € paths dg. acc_path pi auto 


Finally, the language defined by an automaton is the set of words for which there 
exists some accepting run dag. 


language :: (0,3) waa > $ word set 
language auto = {w | ddg € run_dags auto w. is_accepting auto dg} 


3.3. Games and Strategies 


Logical games have become popular tools in semantics as well as in automata 
theory. The word problem for a weak alternating automaton A and an w-word w 
can be visualized as a two-person game between A(utomaton) and P(athfinder) 
where A tries to demonstrate the existence of an accepting run, while P tries 
to spoil A’s efforts. Every draw in the game consists of a move by A, followed 
by a move of P. For his 7’th move, player A sees a state s of the automaton 
and chooses some (small) model M of 6(s, w;), exploiting the non-determinism 
of automaton A. Player P, trying to find some path in the run whose minimum 
rank is odd, then chooses some state t € M for the next round of the game. The 
outcome of the game is a path through a run dag of A for input w. 

We start by defining type synonyms for the positions of the two players. A 
game is represented as an w-sequence of pairs of positions. 


types 
o Apos =a 
o Ppos = 0 set 
o play = (o Apos * a Ppos) seq 


These definitions differ from those used by Thomas in that he defines positions 
as pairs (i, s) resp. (i, 44) where 1 gives the index of the current round. Doing so 
would pollute our terms with projection functions, so we provide i as an extra 
parameter whenever necessary. The following functions define the set of legal 
moves of either player at any position, as well as the initial position. 


Amoves :: [(o,8) waa, @ word, nat, o Apos] => o Ppos set 
Amoves auto w is = smodels (trans auto s (w i)) 
Pmoves :: [(0,@) waa, @ word, nat, o Ppos] = ao Apos set 


Pmoves auto w iM =M 
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init_pos :: (0,8) waa > o Apos 
init_pos auto = initial auto 


A draw sequence is an w-sequence of alternating positions that correspond 
to legal moves; a play is a draw sequence that starts at the initial position. The 
outcome of a play is its projection on the positions of player A, i.e. on the states 
of the automaton.” 


drawseqs :: [(¢,8) waa, @ word, o Apos] => o play set 
drawseqs auto ws = 
{ pl | fst (pl 0) = s 
A Vi. snd (pl i) € Amoves auto wi (fst (pl i)) 
A Vi. fst (pl (i+1)) € Pmoves auto w i (snd (pl i))} 


plays :: [(¢,@) waa, @ word] => o play set 
plays auto w = drawseqs auto w (init_pos auto) 


outcome :: o play > o seq 

outcome pl = fst o pl 

outcomes :: [(a,8) waa, @ word] = o seq set 
outcomes auto w = outcome ‘‘ (plays auto w) 


Player A wins if the least rank among the states in the outcome of the play 
is odd, otherwise P wins. 


Awins :: [(¢,8) waa, o play] => bool 
Awins auto pl = acc_path (outcome pl) auto 


Pwins :: [(o,8) waa, o play] = bool 
Pwins auto pl = — (Awins auto pl) 


The question of interest is whether either player can force a win in the play 
for the given automaton and input word by following a strategy. It turns out 
that for weak alternating automata, it suffices to consider local (i.e., memoryless) 
strategies, where the next move is determined from the current position alone. 
We introduce types and well-formedness predicates for strategies to Isabelle.? 


types 
o Astrat = [nat, o Apos] = o Ppos 
o Pstrat = [nat, o Ppos] => o Apos 


isAstrat :: [o Astrat, (o0,3) waa, @ word] => bool 
isAstrat strat auto w = Vis. strat is € Amoves autowis 


? fst and snd denote the projection functions. The expression f ‘ ‘S denotes the image 
of set S under f. 
3 These definitions will be revised in section 5. 
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isPstrat :: [o Pstrat, (0,8) waa, 8 word] => bool 
isPstrat strat auto w = Vi M. strat iM € Pmoves auto wiM 


A strategy is a winning strategy for either player if it guarantees a win 
provided the player adheres to the strategy. We give the definitions for player 
A; those for player P are similar. 


Aadheres :: [o play, o Astrat] = bool 
Aadheres pl strat = Vi. snd (pl i) = strat i (fst (pl i)) 


isAwinStrat :: [(0,8) waa, @ word, o Astrat] => bool 
isAwinStrat auto w strat = 
isAstrat strat auto w 


A Vpl € plays auto w. Aadheres pl strat => Awins auto pl 


We have now assembled enough definitions to test them by proving some 
theorems. More constructions will be given as we go along. 


4 Acceptance and Winning Strategies 


The first subgoal is to establish the following lemma, reproduced from [9}: 


Proposition 2. The weak alternating automaton A accepts w iff player A has 
a local winning strategy in the game associated with A and w. 


The proof of the “only if” part requires the definition of a winning strategy 
for player A, given an accepting run (dag) of A over w. The idea is to let A force 
the outcome of the play to be a path in the given dag by choosing, for any given 
position, the successors of that state in the dag. Formally, we enter the goal 


[ dg € run_dags auto w; is_accepting auto dg ] 
==> J strat. isAwinStrat auto w strat 


For the proof, we simply provide the “witness” term succs dg for the existential 
quantifier. Isabelle’s automatic tactics are able to solve the resulting subgoal by 
expanding the necessary definitions. 

For the other direction we must construct an accepting dag, given a winning 
strategy for player A. Again, we simply let the successors in the dag be defined 
by the strategy and prove that the dag 


(| root = initial auto, succs = strat |) 


is accepting if strat is a winning strategy for player A. The machine proof 
follows the same pattern outlined above. Note that the concise form of these 
statements and the natural proof pattern is possible because we are working in 
a higher-order setting. 
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5 Dualizing Automata and Strategies 


Encouraged by the success of the proof of the first subproblem, we continue to 
follow Thomas’ exposition. His second lemma connects strategies for an automa- 
ton A with those for its “dual” automaton A obtained by dualizing the transition 
relation and incrementing the ranks. 


dual :: (0,3) waa = (0,8) waa 

dual auto = (| states = states auto, 
initial = initial auto, 
trans »’ s b. dual_form (trans auto s b), 
rank = \ s. (rank auto s)+1 |) 


It is easy to see (and proved automatically by Isabelle) that A is well-formed 
if A is. 

Our main goal is to show that winning strategies for player A in the game 
associated with automaton A give rise to winning strategies for P in the game for 


A and vice versa, as asserted by the following proposition taken from Thomas’ 
paper: 


Proposition 4. Player A has a local winning strategy in the game associated 
with A and word w iff player P has a local winning strategy in the game for A 
and w. 


Looking at Thomas’ proof, we find the following description (adapted to our 
notation) of how to construct a winning strategy for P in the game for A from 
a given strategy for A in the game for A: 


Note that in fixing the strategy it suffices to consider only game positions 
(i, Mf) which are reachable [...] The set M of the game position (2, M) is 


—L 


produced by player A from a game position (%,s) such that M € 6(s, wi). 
Player P chooses such a state s which could produce M via w;. Now in the 
game associated with A at position (2, s), the given local winning strategy of 
A picks some R € 6(s, wi), and the definition of 6 ensures that there is a state 
in RM M. For his move from the game position (¢, /), player P chooses such 
a state [...] 


The idea is that player P forces the outcome of the game to be a possible 
outcome of the game of A; since the minimum rank of that outcome must have 
been even with A’s ranking function, it will now be odd. However, the construc- 
tion described by Thomas requires P to take into account the position (i, s) from 
which player A determined his move in the current draw of the play, which is not 
allowed by our definition of a local strategy from section 3.3. The first sentence 
in the quote above might suggest to try an inductive definition, but this also 
fails. We therefore revise the definition of type Pstrat and the corresponding 
well-formedness predicate as follows: 


types 
o Pstrat = [nat, o Apos, o Ppos] = o Apos 
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isPstrat :: [o Pstrat, (0,8) waa, 8 word] => bool 
isPstrat strat auto w = 
Vis M. M € Amoves auto wis => strat is M € Pmoves autow iM 


With this revised definition, the construction becomes a direct transcription 
of Thomas’ description: player P chooses some state that is both in the set of 
states chosen by A and in the set the original strategy would have chosen from 
A’s transition function; such a state must exist by the relationship between the 
models of a positive Boolean formula and its dual. In Isabelle this construction 
is defined by the expression* 


dualizeAstrat :: [(¢,3) waa, §@ word, o Astrat] = o Pstrat 
dualizeAstrat auto w strat = Ais M. et. t € (M/M strat i s) 


Isabelle proves automatically that the dualized strategy is well-formed if the 
original strategy is. It remains to prove that it is a winning strategy for P in the 
game for A if the original strategy is a winning strategy for A in the game for 
A. We state the goal 


isAwinStrat auto w strat 
= > isPwinStrat (dual auto) w (dualizeAStrategy auto w strat) 


The main problem is to prove that for every outcome of the dualized strategy 
there is a play, obtained by applying the original strategy, on the original au- 
tomaton with the same outcome; that play must therefore be won by A. This 
proof requires a little guidance and takes 10 interactions. 


Thomas asserts that “the other direction is shown analogously, by exchanging 
the roles of A and A”. Isabelle of course requires an explicit. construction, and 
after the revision of type Pstrat, the symmetry is even less obvious. Fortunately, 
the construction turns out to be quite simple, and there is no need for further 
modifications to our definitions. 


dualizePStrat :: [(0,@) waa, 6 word, o Pstrat] => o Astrat 
dualizePStrat auto w strat = 
Ai s. (strat is) ‘‘ (smodels (trans auto s (w i))) 


The proof that this construction yields a winning strategy for player A on A 
given a winning strategy for P on A is very similar to the proof described above. 


6 Determinacy 


The third subproblem in Thomas’ proof consists in showing that all games are 
determined, assuming optimal play. 


Proposition 5. Let A be a weak alternating automaton. From any position in 
the game associated with A and word w, either player A or P has a local winning 
strategy. 


* ¢ denotes Hilbert’s choice operator. 
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6.1 Attractor Sets and Associated Strategies 


The central notion used in the proof is that of an attractor set. This is a set 
of positions (for player A) from which either A or P can force, in finitely many 
draws, a visit to a set of target positions. Attractor sets for player A are defined 
inductively as follows: 


U {(t,s) | for some A-move from (i, s) 
all P-moves lead to attr4(T)} 


The set attr4(T) is defined as the union of all attr4(T). The definition of 
attrp(T) is dual in that attré*1(T) contains those positions for which every A- 
move leads to a position such that P has some move towards attré(T). We omit 
reproducing the formalization of these definitions in Isabelle, which is straight- 
forward. 

Thomas writes: “From the positions in attr4(T) player A can force a decrease 
of distance to T in each step (which defines a local strategy)”. Let us make this 
explicit:> 


attrA_strat :: [(¢,f)waa, @ word, (nat * o Apos)set] > o Astrat 
attrA_strat auto wT = 
Ai s. eM. M € Amoves auto wis 
A Vd. (i,s) € pick_ea (Amoves auto w) (Pmoves auto w) 
(attrAh d auto w T) 
=> VteM. de. e <d 
A (i+1,t) © attrAh e auto wT 


It is easy to prove that this is a well-defined strategy, although the quantifiers 
are now nested deeply enough for Isabelle to require some guidance. It remains 
to show that player A can force, via this strategy, a visit in the set T of target 
positions: 


[ pl € drawseqs auto ws; (i, fst (pl i)) € attrA auto wT; 
Vj. i < j > snd (pl j) = attrA_strat auto wT j (fst (pl j)) ] 
=> Jj. (itj, fst (pl (i+j))) ET 


This assertion is proved by a rather straightforward induction on the di- 
stance d in the definition of attr4(T). More interesting is the formalization of 
the next sentence in Thomas’ proof: “Also note that for any game position ou- 
tside atir,(T), player P will be able to avoid entering this set [...]; otherwise 
the position would be already in attr,(T) itself. So from outside attr,4(T), P can 
avoid, by a local strategy, to enter this set and can hence avoid the visit of T.” 
We boldly formulate a strategy (for P) that avoids entering attr,4(T): 


° The auxiliary function pick-ea formalizes the body of the definition of attr4t*(T). 
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CattrA_strat :: [(c,$) waa, @ word, (nat * o Apos) set] 
=> o Pstrat 
CattrA_strat auto wT = 
Ais M. et. t EM 
A (Ci,s) ¢ attrA auto wT > (iti,t) ¢ attrA auto w T) 


Why is this a well-defined strategy? Assume that (i + 1,t) € attra(T) holds 
for all possible moves t from M; that is, for every such t there exists some d such 
that (i+1,t) € attr4(T). Assuming that M # @ was a legal A-move from state 
s, we need to prove that this implies (i, s) € attra(T). It is here that we require 
M to be a finite set, for then there is some d such that (i + 1,t) € attr(T) for 
all t € M, and therefore we obtain (i, s) € attr“! (T) C attr4(T). Without this 
finiteness assumption, we would only be able to prove (i, s) € pick_ea(attr,(T)), 
which need not be a subset of attra(T). 

It is only at this point in the proof (and in the similar proof for player P) 
where finiteness plays a role, and we therefore obtain a slight generalization of 
Thomas’ result in that the set of states need not be finite, although the transition 
relations must be defined by finitary formulas. 

With this observation, the required well-formedness and correctness conditi- 
ons are not too hard to establish. In fact, their proofs are made more perspicuous 
by observing the following algebraic facts about attractor sets. 


TC U = > attrA auto wT C attrA auto w U 
is_waa auto => attrA auto w (attrA auto w T) = attrA auto w T 


The definitions and proofs for strategies forcing or avoiding visits to attrp(T) 
can mostly be obtained by cut-and-paste: the same tactics can also handle the 
dual quantifier combinations. 


6.2. Proving Determinacy 


Completing the proof of proposition 5, Thomas lets Q; denote the set of states 
with rank i and Posg the set of all positions in the game for automaton A and 
word w, and continues: 


Clearly, from the positions in Ao := attra4(Qo), Automaton can force, by a 
local strategy, to reach states of rank 0 and thus win. Consider the subgame 
whose set of positions is Pos; := Poso \ Ao (all of which have rank > 1). From 
the positions in A; := attrp(Pos;™Q1), Pathfinder can force, again by a local 
strategy to reach (and stay in) states of rank 1 and hence win. [...] In this 
way we continue: In the game with position set Posg := Pos, \ Ai (containing 
only states of rank > 2) we form the attractor set Az := attra( Pos, M Qa), 
etc. Then the positions from which Automaton wins (by the local attractor 
strategies) are those in the sets A; with even 7. Similarly, Pathfinder wins from 
the positions in the set A; with odd i (again by his local attractor strategies). 


Instead of formalizing the induction principle based on “subgames” that this 
argument is based on, we give a direct recursive definition of the sets A;:° 


© Our definition ensures determ_pos i C determ.pos (i+2), unlike Thomas’ defini- 
tion. 
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rankSet :: [o > nat, nat] > (nat * oc) set 
rankSet rk r = { (i,s) | rk s=r } 


determ_pos :: [nat, (¢,8) waa, @ word] => (nat * o Apos) set 
recdef determ_pos "less_than" 
determ_pos 0 = Aauto w. attrA auto w (rankSet (rank auto) 0) 
determ_pos 1 = Aauto w. attrP auto w ((rankSet (rank auto) 1) 
\ (determ_pos 0 auto w)) 
determ_pos (Suc (Suc k)) = Aauto w. 
let pos = determ_pos k auto w 
U ((rankSet (rank auto) (Suc (Suc k))) 
\ (determ_pos (Suc k) auto w)) 
in 
if (k mod 2 = 0) then attrA auto w pos else attrP auto w pos 


We must also define the strategy player A should apply for positions in the set 
Agr. 


A_strat :: [mat, (c,@) waa, @ word] => o Astrat 
primrec 
A_strat 0 auto w = 
Ai s. if (i,s) € rankSet (rank auto) 0 
then «S. S € Amoves auto wis 
else attrA_strat auto w (rankSet (rank auto) 0) is 


A_strat (Suc k) auto w= 
Ais. if (i,s) € determ_pos (k+k) auto w 
then A_strat k autowis 
else if (i,s) € rankSet (rank auto) (Suc (Suc (k+k))) 
\ (determ_pos (Suc (k+k)) auto w) 
then CattrP_strat auto w (determ_pos(Suc(k+k)) auto w) is 
else attrA_strat auto w 
(determ_pos (k+k) auto w 
U (rankSet (rank auto) (Suc (Suc (k+k))) 
\ (determ_pos (Suc (k+k)) auto w))) is 


These definitions may look intimidating, but the idea is quite simple: given a 
position (4,8) in Ag = attr4(Qo), either the rank of s is already 0, in which case 
any continuation ensures that A wins. Otherwise, player A applies the attractor 
strategy that is known to force a visit in Qo. Given a position (i,s) € Agx+e2 
there are three cases: either in fact (¢,s) € Ag, holds, and A applies the strategy 
defined for positions in Ag,, which is already known by induction hypothesis. 
Second, if (¢,$) € Qor+2 \ Aonsi (the “kernel” of Ao,42) then A avoids positions 
in Ao.41; this ensures that the ranks of subsequent positions will be even and 
< 2k + 2. Otherwise, player A applies the attractor strategy for A2,+42 to force 
an eventual visit in the kernel. 
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Continuing the formal development, we first prove that every position is 
contained in some A;: 


dk. k < rank auto s A (i,s) € determ_pos k auto w 


The proof is by case distinction on whether the rank r of s is 0, 1, or i+ 2 for 
some i € N, expansion of the appropriate clause in the definition of A,, and 
further case distinctions guided by the definitions, taking 13 interactions. More 
difficult is the proof of the following lemma, which asserts that strategy A_strat, 
applied to some position in Ag, forces the play to remain in some position in 
Ag, for some | < k. 


[ is_waa auto; pl € drawseqs auto w s; 
(i,fst (pl i)) € determ_pos (k+k) auto w; 
snd (pl i) = A_strat k auto wi (fst (pl i)) ] 
=> jl. 1 < KA (iti,fst(pl(i+1))) © determ_pos (1+1) auto w 


The proof is essentially by induction on k, but requires 44 interactions, including 
many low-level instantiations. Of similar complexity is the proof of the overall 
correctness of the strategy, which takes 34 interactions. 


| is_waa auto; pl € drawseqs auto w s; 

(i, fst (pl i)) € determ_pos (k+k) auto w; 

Vj. i < j => snd (pl j) = A_strat k auto w j (fst (pl j)) ] 
= > Awins auto pl 


It remains to define a similar winning strategy for player P when starting 
from a position in Ag,41, and reprove the analogous theorems. The basic ideas 
are the same, but due to the low-level nature of some of the proof scripts, they 
cannot be copied blindly. Since obviously there cannot be a winning strategy 
for both players for some position, it follows that the sets A, form two disjoint 
hierarchies for k even or odd. 


6.3. Complementation 
We can now prove the complementation theorem: 


is_waa auto = > language (dual auto) = - language auto 


The proof is simple, given the previous lemmas: automaton A does not accept 
input word w iff (by proposition 2) player A does not have a local winning 
strategy in the corresponding game iff (by proposition 4) player P does not 
have a local winning strategy in the game associated with A and w iff (by 
proposition 5) player A has a local winning strategy in that game iff (again by 
proposition 2) A accepts w. 
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7 From Temporal Logic to Weak Alternating Automata 


As an application of our definitions and results, we now formalize a translation 
of formulas of propositional temporal logic of linear time into weak alternating 
automata [4,11]. In contrast to the standard translation of PTL to Biichi auto- 
mata, this construction is compositional and of linear complexity; the hardest 
step in the correctness proof is that for negation, which is based on the theorem 
we have just proven. 


7.1 Propositional Temporal Logic 


We begin with the definition of a deep embedding of PTL (over a type a of 
atomic propositions) in Isabelle:’ 


datatype a ptl = 
TRUE | Var a | NOT a ptl | OR (a@ ptl) (a pti) 
| NEXT a@ ptl | UNTIL (a@ ptl) (a@ ptl) 
type @ behavior = (a set) word 


suffix :: [@ behavior, nat] = a behavior 
suffix pn = Xi. pCitn) 


holdsAt :: [a behavior, a ptl] = bool ("_ F— _") 


p — TRUE 

p — Var x = (x € p 0) 

p |‘ NOT f = A(p F £) 

p l= f Rg = ((p E f) V (pF g@)) 
po — NEXT f = suffix piEf 


po — f UNTIL g = Gn. suffix pn — g AV m<n. suffix pm — f 


models :: a ptl => (a behavior) set 
models f= {p|pEf } 


We will mainly restrict attention to normal formulas that do not contain 
double negations; this predicate is easily defined as a recursive function, as is a 
normalization function that removes any double negations. We can then prove 
laws about PTL formulas such as 


(9 — f UNTIL g) = ((p — g) V (9 E f A p & NEXT(f UNTIL g))) 


We inductively define the set of subformulas of a PTL formula. The Fischer- 
Ladner closure C(y) of a formula y is the set that contains all subformulas of 
y and their complements. Observe that all formulas in C(y) are normal if y is 
normal. 


” We use infix syntax for the OR and UNTIL operators. 
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7.2 Definition of Automaton A, 


We associate a weak alternating automaton A, with every PTL formula y. The 
set of states is given by the Fischer-Ladner closure of y, with y being the initial 
state. The transition relation and the ranks are defined by induction on the 
formula.® 


ptl_trans :: [a ptl, a set] => (a ptl) pboolean 
ptl_trans TRUE S = Atomic TRUE 
ptl_trans (Var x) S = 

if x€s then Atomic TRUE else Atomic(NOT TRUE) 
ptl_trans (NOT f) S = subst complement (dual_form(ptl_trans f S)) 
ptl_trans (f OR g) S = Or (ptl_trans f S) (ptl_trans g S) 
ptl_trans (NEXT f) S = Atomic f 
ptl_trans (f UNTIL g) s = 

Or (ptl_trans g s) 

(And (Atomic (f until g)) (ptl_trans f S)) 


ptl_rank :: a ptl = nat 

pti_rank TRUE = 0 

ptl_rank (Var x) 

ptl_rank (NOT f) 

ptl_rank (f OR g) = max (ptl_rank f) (ptl_rank g) 

ptl_rank (NEXT f) = ptl_rank f 

ptl_rank (f UNTIL g) = let r = max (ptl_rank f) (ptl_rank g) 
in if r mod 2 = 0 then r+i else r 


1 
(ptl_rank f)+1 


ptl_waa :: @ ptl => (a ptl, @ set) waa 


ptl_waa f = 
(| states = fischer_ladner f, initial = f, 
trans = ptl_trans, rank = ptl_rank |) 


The definition of the transition relation for an automaton state corresponding 
to an until formula follows the recursive expansion law shown above. For a 
formula ay, we take the dual of the Boolean combination associated with y, 
but complement all atoms. Based on a few lemmas about the Fischer-Ladner 
closure, it is easy to prove that automaton A, is indeed well-formed if y is a 
normal PTL formula. 


normal f => is_waa (ptl_waa f) 


Moreover, the automaton associated with the complement of y is isomorphic to 
the dual automaton for » modulo a renaming of all its states by complementation 
and possibly an adjustment of ranks that does not affect their parity. 


® subst h p denotes the positive Boolean formula obtained from p by applying h to 
all atoms in p. 


Weak Alternating Automata in Isabelle/HOL 439 


ren_auto :: [(o,8) waa, o0 = 7, (7,8) waa] = bool 
ren_auto src h tgt = 
states tgt = h‘ ‘(states src) 
A initial tgt = h(initial src) 
A Vsestates src. Vb. trans tgt (h s) b = subst h (trans src s b) 
A Vs€states src. rank tgt (h s) mod 2 = rank src s mod 2 


normal f ==> 
ren_auto (dual (ptl_waa f)) complement (ptl_waa (complement f)) 


7.3 Correctness of the Translation 


We now prove that the automaton A, accepts precisely the models of y, for any 
normal PTL formula y. This requires proving that (2, ~) is a winning position for 
player A in the game associated with A, and behavior p if and only if w holds of 
the i’th suffix of p, for any formula w in the Fischer-Ladner closure of y. The only 
non-trivial cases are those for negation and until. For negation, the assertion 
follows by the complementation theorem for weak alternating automata, the 
observation that A, is obtained by renaming A, via complementation, and 
the fact that complementation is injective for normal formulas. The most tedious 
part of the Isabelle proof is to show that an automaton obtained by renaming 
via a function that is injective on the states of the original automaton induces 
the same winning positions for either player. For a formula y = 71 until we, it 
suffices to observe that any draw sequence starting at y remains at w until it 
simulates a draw sequence for ¥ or w2. Using the recursion law for the until 
operator, this justifies the definition of a winning strategy for player A. 

Assembling the bits and pieces, we can now prove the correctness of the 
translation from PTL formulas to weak alternating automata: 


language (ptl_waa (normalize f)) = models f 


Isabelle’s facility to state goals containing unknowns, which are instantiated 
during the proof, allows us to use this theorem to actually construct automata 
A, inside the theorem prover, based on Isabelle’s built-in rewriting engine. In 
this way, we obtain a certified translation procedure from PTL formulas to weak 
alternating automata. The evaluation can actually be performed inside Isabelle 
because the construction is of linear complexity. 


8 Conclusion 


The work on this paper has been started by pure intellectual curiosity: Comple- 
mentation of w-automata has been one of the most celebrated problems of the 
held, and Thomas’ streamlined presentation of the proof based on weak alterna- 
ting automata and their associated games seemed to be the first one that could 
actually be carried out using an interactive theorem prover. The usefulness of 
such an endeavor may, however, be disputable. This case study has once again 
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confirmed that at least part of the social process that leads to the acceptance 
of a proof can actually be replaced by machine certification, and that even well- 
polished hand proofs are still likely to contain errors such as the “type error” 
our formalization has revealed. 


On the other hand, it cannot be disputed that it still takes some effort to 
convert even a very well-presented proof into machine-checkable form: our for- 
malization of Thomas’ proof took about three weeks, being reasonably familiar 
with both the subject matter and the tool. Still, this is perhaps not an inordinate 
amount of time for a reasonably complex proof. 


It is obvious that formalization of mathematical proofs requires attention to 
details that are happily ignored in paper-and-pencil proofs. Still, if formaliza- 
tion in itself is to be of any value, the format of the documents produced by 
machine proofs is also important when discussing formalized mathematics. In 
this respect, our experience has been mixed. On the one hand, we found that 
the formalization of the underlying concepts was fairly natural and simple, due 
to the high expressiveness of the higher-order framework. Concerning the auto- 
mation of proofs, the first part of the work reported here was carried out using 
Isabelle’s standard tactic-based interface. We have found that the automatic 
tactics provided by Isabelle are incompatible with human reasoning capabilities: 
whereas complicated-looking subgoals may succumb to a single tactic invocation, 
other seemingly trivial goals (in particular those that involve arithmetic) need 
tedious low-level interaction. Moreover, small changes to the underlying definiti- 
ons may render a tactic script obsolete. For the translation of PTL to automata, 
we have used Wenzel’s recent Isar interface [12] that is based on an explicit 
proof language. This has been a very positive experience: although documents 
become much more verbose, the user has much more control of the interaction, 
and changes tend to be much easier to apply. 


The main benefit of formalization, however, is exemplified by the certified 
translation of PTL formulas to automata presented in section 7 where effective 
computational procedures have been derived from the purely mathematical con- 
structions arising from the proof. Our translation to automata constitutes the 
first step towards a formalized PTL decision procedure: having constructed the 
automaton, it remains to decide whether its language is empty or not in order 
to decide whether the formula is satisfiable or not. Deciding nonemptiness is 
rather easy using external tools ( “oracles” in Isabelle terminology), for example 
based on BDD technology. In general, deciding nonemptiness of weak alternating 
automata is PSPACE-complete, and it is therefore less clear how to carry out 
this second step inside a theorem prover. Schneider [6] has presented a similar 
translation of PTL formulas to a symbolic representation of Bichi automata. 
His translation is also linear, and he also uses an external tool to decide no- 
nemptiness. We believe that alternating automata may be preferable to Biichi 
automata because the translation is compositional, which may be an advantage 
for applications such as model checking. For example, one may want to apply 
minimizations during the construction of A,, or pre-compute the product of the 
transition system under investigation with, say, certain fairness conditions. 
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Graphical Theories of Interactive Systems: 
Can a Proof Assistant Help? 
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Computer scientists are privileged, or doomed, to deal rigorously with large 
structures. This happens, of course, with hardware design and verification, and 
with programs and specifications. Considerable progress has been made with 
mechanised proof assistance for both. Going further into the back room, pro- 
pramming languages are also big structures. It’s very uncommon to have help 
from a proof assistant while actually designing a language, probably because the 
very formalism for writing down what a language means is changing under our 
feet, so it’s asking too much for those who build proof assistants to keep up with 
these developments enough to help the designers in real time. All the same, it 
has been encouraging to see plenty of post hoc verification of properties of Stan- 
dard ML using its semantic formalism. Perhaps a future language design using 
“big step structure operational semantics” could be done using proof assistance 
to check out the sanity of a large set of inference rules before they are frozen 
into a design. 

Going still further into the back room, some computation theories are also 
big structures. We would like to have a model of mobile interactive systems, in 
the form of a calculus in which one can really verify, say, that certain invariants 
are preserved by all activity; an example would be a security assertion. I have 
been developing action calculi — which is such a model — for many years, often 
changing notation and definitions, wanting to see whether a theorem survives 
the changes, wanting to visualise an embedding of one graph in another (since 
action calculi have a graphical presentation), and so on. A proof assistant would 
be a great help if I could keep it up to date with my changes of mind. It’s a 
very hard challenge to ask for tools to help theory development; but it was also 
hard to get machine-assisted proof off the ground at all, and that has been done. 
In my talk I would like to provoke thinking about how this challenge can be 
addressed incrementally. Computer scientists developing theories are probably 
to the best guinea pigs for pilot studies. 
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Abstract. We describe our formal verification that the Alpha 21364’s network pro- 
tocol guarantees delivery and maintains necessary message ordering. We describe 
the protocol and its formalization, and the formalization and proof of deadlock 
freedom and liveness. We briefly describe our experience with using three tools 
(SMV, PVS, and TLA+/TLC), with the cost effectiveness of formal methods, and 
with software engineering of formal specs. 


1 Introduction 


Compaq’s Alpha 21364 microprocessor[Ban98] includes features that are hard to verify 
with traditional simulation-based methods — particularly glueless multiprocessor sup- 
port. Simulation of a large multiprocessor is a challenge[BBJ* 95,KN95,TQB* 98], and 
though the project leaders felt that simulation would adequately verify that the RTL 
correctly implemented the multiprocessor protocols, it wouldn’t adequately prove the 
absence of deadlock and livelock. 

We describe our formal verification that the 21364’s network (transport) protocol 
guarantees delivery and maintains necessary message ordering. (We also formally veri- 
fied the coherence protocol, but don’t describe that here.) We relate our experiences with 
three tools (SMV, PVS, and TLA+/TLC), with the cost effectiveness of formal methods, 
and with software engineering of formal specs. 

Sections 2 and 3 describe the network protocol and its formalization. Sections 4 
and 5 present the formalization and proofs of deadlock freedom and liveness. Section 6 
discusses the results and what we learned. 


2 The Protocol 


The Alpha 21364 includes glueless multiprocessor support — each processor has four 
direct processor-to-processor network links (plus an I/O interface and a Rambus inter- 
face). In a multiprocessor system, each processor links to its neighbors north, south, east, 
and west, with the ends wrapped around to form a 2D torus, as shown in Figure 1. 

A network protocol provides point-to-point message transport within the torus. It 
guarantees message delivery, maintains ordering of I/O traffic for each source-sink pair, 
and can route around some failures. It is used for message delivery by a cache-coherence 
protocol. 
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Fig. 1. A 12-processor 21364-based multiprocessor. 


Though we verified the real protocol, the protocol and spec fragments shown are 
unverified excerpts and do not represent the protocol’s actual details. In the rest of the 
paper, “the network” and “the protocol” refer to the example rather than the 21364’s 
design. All the points we make are true of both. 

The network uses a store-and-forward protocol, with messages buffered at inter- 
mediate nodes as they hop toward their destinations. Without some mechanism to prevent 
it, a deadlock cycle could appear, with buffers in every node in the cycle full of packets 
waiting for buffer space in the node ahead. An “‘intra-dimensional” deadlock runs all the 
way around the torus in a single direction (and thus in a single row or column). Other 
deadlocks are “inter-dimensional,’ and may have complex looping paths. 

The network protocol avoids intra-dimensional deadlock by providing two virtual 
channels (that is, two buffers) for each direction [DS87]. For each virtual channel (“VC”), 
use of one link in each row or column is prohibited, so no deadlock cycle can form in a 
single VC. Packets travelling in a row or column use one VC or the other for the whole 
trip, so no inter-VC cycle can form in a single row or column. 

Inter-dimensional deadlock is avoided by dimension-order routing [DS87]. One axis 
(north-south, for example) is primary and the other is secondary. Packets complete their 
travel in the primary axis before beginning travel in the secondary. This divides each route 
into two segments, each limited to a single row or column. Deadlock cycles are avoided 
in each segment by the intra-dimensional mechanism, and though packets travelling in 
the primary axis may depend on packets travelling in the secondary, the reverse is not 
true, so no cycle can form between them. 

These mechanisms provide deadlock freedom for the network protocol, but deadlock 
problems reappear in the cache coherence protocol. To avoid them, coherence protocol 
messages are grouped into six classes. Each message can be completed using messages 
only from lower classes. The network protocol provides separate sets of buffers for each 
coherence-protocol message class, in effect providing independent networks. 
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3 The Formal Spec in PVS 


We specify the protocol and the properties it guarantees in the TLA+ style (but with 
PVS), with a state space, initial condition, set of actions, and set of fairness conditions. 
The state space for each spec is represented by a type, the initial condition by a predicate 
saying whether or not a state is a legal initial state, actions by predicates over state pairs 
(the pair being the bef ore and after states), and fairness conditions by predicates over 
state histories. 

The spec is a collection of PVS theory files. Some had to be separate theories so they 
could be parameterized and called several times. The theory tree is shown in Figure 2. 


t 
digraph 


| | 
—_— ee 
[network] —. (favariante | 


\ == 


correctness 


Fig. 2. Theory hierarchy. 


The protocol’s type definitions (config) define the space of legal network confi- 
gurations and states, which need no notion of time. The initial conditions, actions, and 
fairness conditions are in a separate theory (network). These use signals, histories, 
temporal logic operators, and fairness, and PVS doesn’t provide those as built-in libra- 
ries, SO we wrote the pieces we needed (signals, temporal, and fairness). Many 
lemmas about the protocol (invariants) are used in the proof of deadlock freedom 
(deadf ree), which itself is expressed using the built-in PVS [BS97] graph-theory library 
(digraph_1lib). 

The signals, temporal, and fairness theories are re-used to express the proto- 
col’s top-level guarantees (abstract). The abstract and concrete protocol specs have 
different notions of state (and in an earlier version, had different notions of time). We 
need to augment the concrete spec with “auxilliary variables” (networkAux), and then 
use a rather complex refinement mapping to extract an abstract history from the concrete 
protocol history before we can say that it must satisfy the abstract spec (correctness). 
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3.1 Signals, Histories, Temporal Logic, and Fairness 


ile ae and temporal-logic operators (temporal) are trivial, and are shown 


in} 1] and| 2}, 
signal [T: TYPE]: THEORY 
BEGIN 
Signal: TYPE = [nat -> T] 
END signal 


temporal: THEORY 
BEGIN 
IMPORTING signal [bool] 


Eventually(signal: Signal[bool]): Signal[bool] = 
LAMBDA(t1: nat): EXISTS(t2: nat): t2 >= ti AND signal (t2) 


Always(signal: Signal[beol]): Signal[bool] = 
LAMBDA(ti: nat): FORALL(t2: nat): t2 >= ti IMPLIES signal(t2) 


END temporal 


Theory introduces types Action and History and defines weak and strong 
fairness. State is an arbitrary parameter of the theory. Given an action, IsEnable 
returns a state predicate that says whether the action is enabled in a given state. The 
action is enabled if there exists a next state such that the action occurs between the 
current state and the next state. Given an action and a history, Occurs returns a signal 
saying at what times the action occured. MapPredicate maps a state predicate over 
a state history, returning a signal saying at what times in the history the predicate is 
satisfied. An action is weakly fair if it prohibits histories in which the event is forever 
enabled yet never occurs. An action is strongly fair if it prohibits histories in which the 
event is enabled infinitely often yet never occurs. 


fairness[State: TYPE]: THEORY | 3 
BEGIN 
IMP 


ORTING temporal, signal[State], signal[bool] 


Action : TYPE 
History: TYPE 


[State, State -> bool] 
Signal [State] 


not 


IsEnabled(action: Action) : [State -> bool] = 
LAMBDA (state: State) : EXISTS (mext_state: State) : action(state, next_state) 


Occurs(action: Action, history: History) : Signal[bool] = 
LAMBDA (t: nat) : action(history(t), history(t+1)) 


MapPredicate(predicate: [State -> bool], history: History): Signal[bool] = 
LAMBDA (t: nat) : predicate(history(t)) 


IsWeaklyFair(action: Action, history: History): bool = 
Eventually (Always (MapPredicate(IsEnabled(action), history))) (0) 
IMPLIES Always (Eventually (Occurs(action, history))) (0) 


IsStronglyFair(action: Action, history: History): bool = 
Always (Eventually (MapPredicate(IsEnabled(action), history))) (0) 
IMPLIES Always(Eventually(Occurs(action, history) )) (0) 
END fairness 


3.2 The Abstract Spec (The Protocol’s Guarantees) 


The protocol’s guarantees are easy to get roughly right: all sent messages must be de- 
livered, and I/O messages from a particular source to a particular destination must be 
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delivered in order. A few details tighten it up: only sent messages may be delivered. 
The destination of a message may not always be ready to accept deliveries, but we can 
assume it will always eventually be ready. Given that assumption, the network must 
always eventually be ready to accept new sent messages. 

Actually, the network must guarantee that each message class accepts and delivers 
messages even if all other classes are blocked, so our assumption about the eventual 
readiness of destinations is expressed per class. 

We made the formal spec of the protocol’s guarantees as simple and abstract as 
possible, limited mostly by having to express the I/O-ordering constraint. We abstracted 
away most of the rest by parameterizing the theory. 

In the hardware, messages aren’t uniquely identifiable. We need to prohibit the pro- 
tocol from throwing away or duplicating a message in flight even when an identical one 
is also in flight. We avoid the problem in the property spec by giving each message a 
unique ID. As it happens, we also need a time stamp so we can express in-order delivery, 
so we use the message UID for both purposes. The UID doesn’t exist in the hardware 
implementation, and how we have deal with that in the proof is explained in Section 5. 

The theory abstract has several parameters. Source is the set of processor interfa- 
ces that can present the network with messages for delivery. Sink is the set of interfaces 
to which those messages are delivered. Class is an attribute of messages; we assume that 
Class contains at least two distinct classes (RDIO, WRIO). The type State represents the 
network with an unordered set of tuples (source ID, destination ID, message content), 
the counter value to express ordering between I/O messages, CanSource which says 
whether the network is ready to receive new messages from a source port, and CanSink 
which says whether a sink port is ready to receive a delivered message. 


abstract(PID: TYPE+, Source: TYPE+, Sink: TYPE+, Class: TYPEt, — 4 
MessageBody: TYPE+, ... more parameters ...]: THEORY 

BEGIN 

ASSUMING RDIG: Class 
WRIO: Class 
rw_distinct: ASSUMPTION RDIO /= WRIO 

ENDASSUMING 


fset_lib: LIBRARY "/usr/share/pvs-2.3/lib/finite_sets” 


SourceID : TYPE = [# proc: PID, port: Source #] 
SinkID : TYPE = [# proc: PID, target: Sink #] 
Message : TYPE = [# class: Class, body: MessageBody, counter: nat #] 


IMPORTING fset_lib@finite_sets[[SourceID, SinkID, Message]] 


State : TYPE = [# Network : finite_set{[SourceID, SinkID, Message]], 


Counter : nat, 
CanSource : [SourceID, Class -> bool], 
CanSink : {SinkID, Class -> bool] #) 


IMPORTING fairness [State] 


Action : TYPE = fairness [State] .Action 
History : TYPE = fairness [State] .History 


The abstract network has five possible actions, shown in| 5 | and described below. 
NewMessage puts a specified new message into the network for delivery from a specified 
source port to a specified destination port. After NewMessage, CanSource may be false 
for that class at that port. The action AssertCanSource sets it back to true to let 
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more messages be sent. DeLiverMessage delivers a specified message to the specified 
destination. AssertCanSink lets more messages of that class be delivered to that port. 
StutterStep does nothing, and is the abstraction of actions in the implementation that 
have no visible effect at the abstract level. 


éwMNessage(src: SourcelD, dst: SinkID, wm: Message): Action = 
LAMBDA (s, s_n:State): 


CanSource(s)(src, m‘class) 
AND m‘counter = Counter(s) 
AND CanSometimesSink(dst‘target, m‘class) 
AND src‘proc = dst‘proc IMPLIES LocalRouteAllowed(src‘port, dst‘target) 
AND Network(s_n) = add((src, dst, m), Network(s)) 
AND Counter(s_n) = Counter(s)+1 
AND ( CanSource(s_n) = CanSource(s) 

OR CanSource(s_n) = CanSource(s) WITH [(src, m‘class) := FALSE] ) 

AND CanSink(s_n) = CanSink(s) 


DeliverMessage(dst: SinkID, m: Message): Action = 
LAMBDA (s, s_n:State): 
CanSink(s)(dst, m‘class) 
AND EXISTS (src: SourceID) : 
member((srec, dst, m), Network(s)) 
AND ( ( m‘class=RDIC OR m‘class=WRIO ) 
IMPLIES FORALL (m2: I0Message) : 
member((src, dst, m2), Network(s)) 
IMPLIES m2‘counter >= m‘counter ) 
AND Network(s_n) = remove((src, dst, m), Network(s)) 
AND Counter(s_n) = Counter(s) 
AND ( CanSource(s_n) = CanSource(s) 
OR CanSource(s_n) = CanSource(s) WITH [{src, m‘class) := TRUE] 
} 
AND ( CanSink(s_n) 
OR CanSink(s_n) 
) 


CanSink(s) WITH [(dst, m‘class) := FALSE] 
CanSink(s) 


AssertCanSource(src: SourceID, class: Class): Action = 
LAMBDA (s, s_n:State): 
CanSometimesSource(src‘port, class) = TRUE 
AND CanSource(s)(src, class) = FALSE 
AND CanSource(s_n} = CanSource(s) WITH ((sre, class) := TRUE] 
AND Network(s_n) = Network(s) 
AND Counter(s_n) = Counter(s) 
AND CanSink(s_n) = CanSink(s) 


AssertCanSink(dst: SinkID, class: Class): Action = 
LAMBDA (s, s_n:State): 
CanSometimesSink(dst ‘target, class) = TRUE 
AND CanSink(s) (dst, class) = FALSE 
AND CanSink(s_n) = CanSink(s) WITH [(dst, class) := TRUE] 
AND Network(s_n) = Network(s) 
AND Counter(s_n) = Counter(s) 
AND CanSource(s_n) = CanSource(s) 


StutterStep: Action = LAMBDA (s, s_n:State): s = s_n 
NewMessage for some source, destination, and message occurs between an initial 
and after state iff 


uo 


the network is initially ready to accept a message of this class at this port, and 
the message is stamped with the initially current time, and 

the destination port accepts messages of this class, and 

if the message is local, this source port may send to this target, and 

the network afterward has its initial contents plus the message, and 

the timer afterward is one greater than it was initially, and 
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e the network may or may not be able to accept such messages from that port afterward, 
but the status of other input ports is unchanged, and 
e the status of output ports is unchanged. 


Note that the first two clauses depend only on the initial state, and the second two 
depend on the parameters and are independent of the state. These four together are the 
action’s enabling condition. The others relate the result state to the initial state. Actions 
are predicates, not functions, to allow nondeterminism, and the value of CanSource 
after NewMessage shows how that’s used. Finally, it’s easy to forget to put in the last 
clause, but omitting it allows unintended nondeterminism — the status of output ports 
afterward could be anything. With it, the abstract actions are mutually exclusive. The 
implementation may do several actions at once because our mapping of time allows us 
to observe the abstract state intermittently. 

DeliverMessage is enabled iff 


e the environment is ready to accept that message class at this port, and 

e an in-flight message has the specified destination and content, and 

e if the message is RDIO or WRI, it is the oldest I/O message in the net from its source 
to its destination 


and actually occurs between an initial and after state iff 


e the network afterward has its initial contents minus the message, and 

e the network may be able to accept such messages from the source port afterward 
but the status of other input ports is unchanged, and 

e the destination port may or may not be able to accept the delivery of such messages 
afterward but the status of other output ports is unchanged. 


AssertCanSource (at some port and class) is enabled if CanSource is currently 
false there, and if that port is ever allowed to source such messages. The action occurs if 
CanSource becomes true there and all other system state is unchanged. AssertCanSink 
is the equivalent action for destination ports. A StutterStep does nothing. 

The action AbstractNetworkNext (shown in [6}) is an action that is one of the five 
allowed actions. 


KD Next: A 
LAMBDA (s,s_n:State): 
EXISTS (src: SourceID, dst: SinkID, cl: Class, m: Message) : 


NewMessage(src, dst, m)(s, s_n) OR 
DeliverMessage(dst, m)(s, s_n) oR 
AssertCanSource(src, cl)(s, s_n) OR 
AssertCanSink(dst, cl)(s, s_n) OR 


StutterStep(s, s_n) 


The fairness requirement (shown in guarantees that every message that enters the 
network will be delivered. The first clause says that if a destination port is available often 
enough and there is a message in flight to it, the message will eventually be delivered to 
it. The second clause says that if all destination ports are eventually ready to sink packets 
in a class, then all ports will eventually be ready to accept messages in that class. 
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StractNétworkFairness(h: History): bool = 7 
( FORALL (dst: SinkID, m: Message) : 
IsStronglyFair(DeliverMessage(dst, m), h) 

) AND 
( FORALL (class: Class) : 

(FORALL (dst: SinkID) : IsWeaklyFair(AssertCanSink(dst, class), h)) 

IMPLIES 

(FORALL (src: SourceID): IsStronglyFair(AssertCanSource(src, class),h)) 


In the initial condition, the set of in-flight messages is empty, the time stamp is zero, 
and all ports that can potentially accept messages are ready. 

AbstractNetworkSpec (sp specifies the set of allowed behaviours. The initial 
state must satisfy the initial condition, all subsequent states in the behaviours are obtained 
by executing one of the possible actions, and the fairness requirements should hold. 


AbstractNetworkInit(S. state): Bool = 8g 
s = (# Network := emptyset, 


Counter 


0, 
CanSource := L 


AMBDA (src: SourceID, class: Class) : 
CanSometimesSource(src‘port, class), 

CanSink := LAMBDA (dst: SinkID, class: Class) : 

CanSometimesSink(dst ‘target, class) #) 


AbstractNetworkSpec(h: History): bool = 
AbstractNetworkInit(h(0)) AND 
Always (Occurs (AbstractNetworkNext ,h))(0) AND 
AbstractNetworkFairness(h) 


3.3. The Concrete Spec (How the Protocol Works) 


The concrete spec says how the protocol works. It describes a 2D torus with possibly 
failing links, legal configurations, and the routing algorithm. The basic theory axis 
(9p defines the two dimensions East-West and North-South, the four compass-point 
directions, the primary axis, non-negative cartesian coordinates for those axes, and few 
additional constants. 

(axis THEORY 

BEGIN peo) 
Axis : TYPE = NS, EW 
PrimaryAxis: Axis 


OppositeAxis(axis:Axis): Axis = CASES axis OF NS: EW, EW: NS ENDCASES 


SecondaryAxis : Axis = OppositeAxis(PrimaryAxis) 
Direction : TYPE = {N, S, E, W} 
NonnegCoordinates : TYPE = [Axis -> nat] 


OppositeDirection(d:Direction) :Direction = 
CASES d OF N:S, S&S: N, E: W, Wo: E ENDCASES 
IsPositive (d:Direction):bool = d=N ORG=E 
DirectionsOfAxis(a:Axis) : TYPE = 
{ d : Direction | IF a = NS THEN d = N or d = S ELSE d = E or d = W ENDIF } 
AxisOfDirection(d:Direction): Axis = 
CASES d OF N: NS, S: NS, E: EW, W: EW ENDCASES 
END axis 
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The theory config (i) describes legal configurations of the network and its static 
control registers. 


contig axis antomUorner: NonhegCoordinates, => tO 
(IMPORTING coordinates[PhantomCorner]) InnerCorner: CoordInRect, 
Source: TYPE+, Sink: TYPE+, Class: TYPE+, ... ] : THEORY 

BEGIN 


BrokenLinks : [Pid, Direction -> bool] 

Coordinate(a:Axis): TYPE = { n : nat | n <= PhantomCorner(a) } 

BufferID : TYPE = (# proc: Pid, port: InputPort, cl: Class, ch: Channel #] 
IPBufferID : TYPE = { bid: BufferIDRep | is_internal(port(bid)) } 
EPBufferID : TYPE = { bid: BufferIDRep | is_external(port(bid)) } 

SinkID : TYPE = [# proc: Pid, port: Sink, cl: Class #] 


RouteType : TYPE = {# dir : [a:Axis -> DirectionsOfAxis(a)], 
val : [a:Axis -> Coordinate(a)], 
FirstAxis : Axis, 
. more fields ... #) 


RoutingTable: TYPE = [Pid -> [Pid -> RouteType ] ] 
VCSelect : TYPE = 

[Pid -> [d:Direction -> [Coordinate(AxisOfDirection(d)) -> VirtChannel]]] 
Neighbors : TYPE = 

(Pid -> [Direction -> [[# proc: Pid, port: Direction, working: bool #]]]]} 


[Network : TYPE = [# RoutingTable: RoutingTable, VCSelect: VCSelect #] 


The theory network (1) has several parameters, including the routing tables, the 
assignment of virtual channels (VC), and their respective datelines. The assumptions 
restrict these parameters to have legal values. For instance, one VCSelectAllowed 
assumption prevents switches from VCO to VC1 to avoid intra-dimension deadlock. 

The network state is described with a record that contains the value of each buffer, 
the number of free buffer entries at each input port, the wires through which a node 
receives deallocs from neighbors, the wires through which a node receives packets from 
neighbors, and the wires CanSink through which the environment asserts its readiness 


to accept message delivery. 

network (Source: TYPE, Sink: > Message: : ass: 3 Il 
RoutingTable: RoutingTable, VCSelect: VCSelect, 
LocalRouteAllowed: [Source, Sink -> bool],... ] : THEORY 


BEGIN 


ASSUMING 
vVCSelectAllowed(vc_select:VCSelect ,rt:RoutingTable): bool = ... 
GoodRoutingTable : ASSUMPTION RoutingTableAllowed(RoutingTable) 


GoodVCSelect : ASSUMPTION VCSelectAllowed(VCSelect ,RoutingTable) 
ENDASSUMING 
Header: TYPE = [# dir : [a:Axis -> DirectionsOfAxis(a)], 
val : [a:Axis -> Coordinate(a)], 
vc : VirtChannel, 
- more fields ... #) 


Packet: TYPE = [# route : Header, dest : Sink, class : Class, msg : Message #] 
BufferEntry: TYPE = [# pkt : Packet, 
wait: finseq[bool], 
outports : finite_set[[QutputPort,Channel]] #] 
State: TYPE = [# Buffers : [BufferID -> [# Packets : finseq[BufferEntry] , 
Deallocs : nat #]], 
FreeBufferEntries : [EPBufferID ~> nat], 


DeallocsIn : (Pid -> [Direction -> finseq[Dealloc]]], 
PacketsIn : [Pid -> [Direction -> finseq[Packet]]], 
CanSink : [SinkID -> bool] #] 


IMPORTING fairness [State] 
Action : TYPE = fairness[State] .Action 
History : TYPE = fairness [State] History 
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eceivePacket(i:Pid, d:Direction): Action = S,5_n: state): L2] 
length (PacketsIn(s)(i)(d)) > 0 AND 
LET 


IN 
Append (Packets(Buffers(s)(inBuff)), 
(# pkt:= newPkt, wait := wait_bv, outports:= outPorts #)), 
(PacketsIn) (i) (d) := Tail(PacketsIn(s) (i) (d)) ] 
ReceiveDealloc(i:Pid, d:Direction): Action = LAMBDA(s,s_n: State): 


AND LET newD = Head(DeallocsIn(s) (i) (d)), 


. other actions ... 


NetworkNext : Action = 
LAMBDA (s,s_n: State): 
EXISTS (i,j: Pid, ip: Sink, ep: Direction, srcBuf: BufferID, epBuf: EPBufferID, 
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The protocol allows these actions: 


NewMessage: A node accepts a new message from an internal port for delivery to a 
destination port. It builds a new packet and constructs the set of port/channel pairs 
that this packet can route through next. The source and destination classes must 
match, and the destination must be able to receive packets. 

ReceivePacket: A node moves a packet from a specified external port to an input 
buffer, adding info about legal next hops. 

RoutePacket A node moves a packet from a specified internal or external port 
buffer to a specified external port, updates the wait-bitvector if it’s an IO packet, and 
decrements the count of free buffer entries. 

SendDealloc: A node sends a buffer-deallocation message (a dealloc) to a neighbor 
saying that a specified buffer has free entries. 

ReceiveDealloc: A node receives a dealloc from a neighbor, and increases its 
count of free buffer entries. 

AssertCanSink: A specified destination port signals that it is ready for the delivery 
of messages of a specified class. 

DeliverMessage: A specified message is moved from a specified buffer to a spe- 
cified destination port. The destination may or may not be ready to accept more. 


The code for two actions is shown in 12], with NetworkNext. 


newPkt = Head(PacketsIn(s)(i)(d)), 

newPktCh = VC(route(newPkt)), 

inBuff = (# proc := i, port :=d, cl := class(newPkt), ch := newPktCh #), 
wrioBuffer = inBuff WITH [(cl) := WRIO], 

outPorts = QutPorts(i, DirectionToInputPort(d), newPkt), 

WRIOQueue = Packets (Buffers(s) (wrioBuffer)), 

wait_bv =... 

s_n = s WITH [(Buffers) (inBuff) (Packets) := 


length (DeallocsIn(s)(i)(d)) > 0 


freeClass = PROJ_i(newD), 
freeChannel = PROJ_2(newD), 
freeBuffer = (# proc:= i, port:= DirectionToInputPort(d), 
cl:= freeClass, ch:= freeChannel #) 
IN s_n = s WITH [(FreeBufferEntries)(freeBuffer) := 
FreeBufferEntries(s)(freeBuffer) + 1, 
(DeallocsIn)(i)(d) := Tail(DeallocsIn(s)(i)(d)) ] 


ipBuf: IPBufferID, dstBuf: SinkID, m: Message, dstCh:Channel, 
srcEntryIndex: below[length(Packets(Buffers(s)(srcBuf))})]) : 
NewMessage(ipBuf, dstBuf, m)(s,s_n) OR ReceivePacket(i,ep)(s,s_n) 
OR RoutePacket(srcBuf, ep, srcEntryIndex, dstCh)(s,s_n) 
OR SendDealloc(epBuf)(s,s_n) OR ReceiveDealloc(i,ep)(s,s_n) 
OR DeliverMessage(srcBuf, ip, m, srcEntryIndex) (s,s_n) 
OR AssertCanSink(dstBuf)(s,s_n) OR StutterStep(s,s_n) 
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InternalActionsAreFair prohibits histories with unfair behavior. The 
actions RoutePacket and DeliverMessage need to be strongly fair because the out- 
put ports and the sink ports are shared by packets coming from several directions. Weak 
fairness is sufficient for the other actions. 


InternalActionsAreFair(h:History): bool = 
FORALL (i:Pid, epBuffer:EPBufferID, srcBuffer:BufferID, dstPort:Sink, 
d:Direction, m:Message, destChannel:Channel, srcIndex:nat): 


IsStronglyFair( RoutePacket(srcBuffer, srcIndex, destChannel), h ) AND 
IsWeaklyFair ( ReceivePacket(i, a), h ) AND 
IsWeaklyFair ( SendDealloc(epBuffer), h ) AND 
IsWeaklyFair ( ReceiveDealloc(i, d), h ) AND 


IsStronglyFair( DeliverMessage(srcBuffer, dstPort, m, srcIndex), h ) 

The initial state considers the network empty and the sink ports available. The pre- 
dicate NetworkProtocolAllows ties together the initial condition, the state- 
transition function, and the fairness constraint. 


NetworkInit(s: State):bool = 
FORALL (bid : BufferID, epbid : EPBufferID, ipbid : SinkID, i:Pid, d:Direction): 
Buffers(s) (bid) = (# (Packets) := empty_seq, (Deallocs) := 0 #) 


AND FreeBufferEntries(s)(epbid) = BufferSize(epbid) 
AND CanSink(s)(ipbid) = CanSometimesSink(ipbid‘port, ipbid‘cl) 
AND DeallocsIn(s)(i)(d) = empty_seq AND PacketsIn(s)(i)(d) = empty_seq 


NetworkProtocolAllows(h:History): bool = 
NetworkInit (n(0)) 
AND Always (Occurs (NetworkNext ,h)) (0) 
AND InternalActionsAreFair(h) 


4 Formalization and Proof of Deadlock Freedom 


RDIO fAwrio [] request 


Fig. 3. Graph of dependencies between the packets. 


Our first verification step consists of proving that the network protocol implemen- 
tation is free of deadlock. Actually, we prove that a dependency graph is free of cycles, 
rather than that an action is always eventually enabled. We will see in Section 5 how the 
same dependency graph is re-used to prove the liveness property. We used a similar tech- 
nique in a previous work [MHJG00]. Dependency graphs are represented using the PVS 
directed-graphs library. Vertices are buffer entries (type VCNode in [15). An entry nd1 
depends on another entry nd2 if the packet in nd2 is blocking the packet in nd1. Figure 3 
shows an example of a dependency between a packet in the C port buffer (L2-cache port) 
in Node and all packets in the East port buffer in Node2. It also shows intra-buffer 
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dependencies between 10 packets in Node2 (dashed lines) and dependencies between 
packets in the West port of Node2 and packets in the West port of Node (solid lines). 
Note that there is a non-IO dependency only when the destination buffer is entirely full. 
I/O packets depends on older I/O packets going to the same destination. 


deadfree [...] : THEORY 
BEGIN 
digraph_lib: LIBRARY "/usr/share/pvs-2.3/lib/digraphs" 


IMPORTING invariants(...] 


Node : TYPE = [# buff_id ; BufferID, index : below[BufferSize(buff_id)] #] 
VCNode : TYPE = nd: Node | nd‘buff_id‘ch = VCO or nd‘buff_id‘ch = VC1 


IMPORTING digraph_lib@path_ops[VCNode], digraph_lib@path_lems [VCNode] 


DepGraph(s:State) : digraph[VCNode] = 
(# vert LAMBDA (nd:VCNode): TRUE, 
edges := LAMBDA (ndi,nd2:VCNode): 
IF is_internal (port (buff_id(nd2))) OR buff_id(nd1) = buff_id(nd2) 
THEN FALSE 
ELSE 
LET Buffer_nd1 


buff_id(nd1), 

Buffer_nd2 buff_id(md2), 

Packets_ndi = Packets(Buffers(s)(Buffer_ndi)), 

Packets_nd2 = Packets(Buffers(s) (Buffer_nd2)), 

dstPort = OppositeDirection(port (Buffer_nd2)), 

dstChannel = ch(Buffer_nd2), 

Deallocs_nd2 = Deallocs(Buffers(s) (Buffer_nd2)), 

DeallocsIn_ndi = DeallocsIn(s)(proc(Buffer_nd1))(dstPort), 

dstBuf = Buffer_ndi WITH [{port) := dstPort, (ch) := dstChannel ] 
IN % Non I0 dependencies: 

Wine ma Reet fara aha t 

( cl(Buffer_ndi) = cl(Buffer_nd2) 

AND index{nd1) < length(Packets_nd1i) 

AND index(nd2) < length(Packets_nd2) 

AND Deallocs_nd2 = 0 

AND FreeBufferEntries(s)(dstBuf) = 0 

AND member (dstPort, dstChannel), outports(Packets_ndi(index(nd1)))) 

AND NOT(EXISTS (k:below[length(DeallocsIn_nd1)]): 

DeallocsIn_ndi(k) = (cl(Buffer_nd2) ,ch(Buffer_nd2))) 
AND Neighbors (Buffer_ndi‘proc) (dstPort) ‘proc = Buffer_nd2‘proc 


OR 
% I0 dependencies: 
% Ss i Sp i tpt hg ig il 


Note that DepGraph does not include intra-buffer dependencies between I/O packets. 
We first prove the theorem NoDeadlockInv which says that every walk (a sequence 
of vertices) in the dependency graph is a path (no vertex occurs twice). In other words, 
no cycle can be formed in any behaviour allowed by the implementation. Essentially, 
we prove that the invariant holds at the initial states and is preserved by every transition. 
The proof uses induction on the walk length and several invariants proved in the theory 
invariant. Some of these invariants assert that the state of a packet and its position in 
the network is consistent with its route information. 


NoDeadlockInv : THEOREM FORALL (h:network.History): 
LET NoDeadlock(s:State): bool = 


FORALL (w:Walk(DepGraph(s))): path? (DepGraph(s) ,w) 
IN NetworkProtocolAllows(h) IMPLIES Always (MapPredicate(NoDeadlock, h)) (0) 
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Then we extend the dependency graph to include the intra-buffer dependencies (17): 


%4 The fellowing definition adds intra~buffer dependencies to the DepGraph. 
DepGraph2(s:State) : digraph[VCNode] = 
(# vert := LAMBDA (nd:V¥CNode): TRUE, 
edges := LAMBDA (nd1,nd2:VCNode): 
edges (DepGraph(s) ) (nd1i,nd2) 
OR ¢ buff_id(nd1) = buff_id(nd2) 
AND (buff_id(nd1)‘cl = RDIO OR buff_id(nd1)‘cl = WRIO) 
AND index(nd1) < index(nd2) 
AND LET Packets_ndi = Packets (Buffers(s) (buff_id(ndt))), 
Packets_nd2 = Packets(Buffers(s) (buff_id(nd2))), 
Entryi = Packets_ndi(index(ndi)), 
Entry2 = Packets_nd1(index(nd2)) 
IN Entryi‘pkt‘route‘val = Entry2‘pkt‘route‘val 
) 


We prove the theorem NoDeadlock2Inv by showing that intra-buffer de- 
pendencies do not introduce cycles. This follows from the definition DepGraph2 and 
theorem NoDeadlockInv. 


NoDeadlock2Inv : THEOREM FORALL (h:network.History): 
LET NoDeadlock(s:State): bool = 
FORALL (w:Walk(DepGraph2(s))): path? (DepGraph2(s) ,w) 
IN NetworkProtocolAllows(h) IMPLIES Always (MapPredicate(NoDeadlock, h))(0) 


5 Formalization and Proof of the Liveness Property 


In order to show that the network protocol satisfies its abstract correctness specification 
we construct a refinement mapping which maps every concrete history allowed by the 
implementation into an abstract history allowed by the abstract specification. In order to 
simplify the definition of the refinement map, we have extended the concrete state with 
an auxilliary variable to record the ID of the packet (from counter) and the node that 
sourced it (proc, port). We make sure the transitions never use the values of auxilliary 
variables, they only move them around (9p. PVS allows us to do that in a clean way 
by creating an instance of the theory network with the new type Message (lines 1,2,3,4 


in [19). 


.-J 1 THEORY 


: TYPE = [# body: MessageBody, 
source: [# proc: Pid, port: Source #], 
counter: nat #] 
IMPORTING network[...,Message,...] 


StateAux : TYPE = [# state : State, Counter : nat #] 


IMPORTING fairness (StateAux] 
Action : TYPE = fairness (StateAux] .Action 


NetworkAuxInit(s: StateAux):bool = NetworkInit(s‘state) AND s‘Counter = 0 


NewMessageAux(srcBuffer:IPBufferID, dstBuffer:SinkID, m:Message): Action 
= LAMBDA (s,s_n: StateAux): 

m‘source = (# proc := srcBuffer‘proc, port := srcBuffer‘port #) 

AND m‘counter = s‘Counter 

AND NewMessage(srcBuffer, dstBuffer, m)(s‘state,s_n‘state) 

AND s_n‘Counter = s‘Counter + 1 


MapState | 20) maps a concrete state from networkAux into an abstract state from 
abstract. A tuple e is in flight in the abstract network if there exists an entry in any 
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buffer or wire in the concrete network holding a packet travelling from the source e‘1 
to the destination e ‘2 carrying the message e‘3. CanSource is set in the abstract state 
iff the buffer of the input port in the corresponding concrete state is not full. 


correctness(Source: TYPE+, Sink: TYPE+, ...] : THEORY 
BEGIN 
c : THEORY = networkAux({PhantomCorner, InnerCorner, Source, ...] 


a : THEORY = abstract[Pid, Source, Sink, MessageBody, Class, ...] 


MapState(s:c.StateAux): a.State = 
(# Network := LAMBDA (e: [a.SourceID, a.SinkID, a.Message]): 
(EXISTS (srcBuffer:BufferID): 
EXISTS (k:below[length(s ‘state ‘Buffers (srcBuffer) ‘Packets)]): 
e‘i = s‘state ‘Buffers (srcBuffer) ‘Packets (k) ‘pkt ‘msg ‘source 
AND e‘2 = (# 
proc:= epsilon! (i:Pid): WhoAmI(i) = 
s‘state‘Buffers(srcBuffer) ‘Packets (k) ‘pkt ‘route‘val, 
target:= s‘state ‘Buffers (srcBuffer) ‘Packets (k) ‘pkt ‘dest 
#) 
AND e‘3‘class = s‘state‘Buffers(srcBuffer) ‘Packets(k) ‘pkt ‘class 
AND e‘3‘body = s‘state ‘Buffers (srcBuffer) ‘Packets (k) ‘pkt ‘msg ‘ body 
AND e‘3‘counter = s‘state ‘Buffers (srcBuffer) ‘Packets (k) ‘pkt ‘msg‘ counter) 
OR (EXISTS (i:Pid, d:Direction): 
EXISTS (k:below[length(s ‘state ‘PacketsIn(i)(d))]): 
e‘i1 = s‘state‘PacketsIn(i) (d) (k) ‘msg‘ source 
AND e‘2 = (# proc := epsilon! (j:Pid): WhoAmI(j) = 
s ‘state ‘PacketsIn(i)(d) (xk) ‘route‘val, 
target := s‘state‘PacketsIn(i)(d)(k) ‘dest 


#) 
AND e‘3‘class = s‘state‘PacketsIn(i) (d) (k) ‘class 
AND e‘3‘body = s‘state‘PacketsIn(i) (d) (k) ‘msg‘ body 
AND e‘3‘counter = s‘state‘PacketsIn(i) (d) (k) ‘msg‘counter), 


Counter := s‘Counter, 
CanSource := LAMBDA (src:a.SourceID,cl:Class) : 
LET source = (# proc := src‘proc, port := src‘port, cl := cl #) 
IN length(s‘state‘Buffers(source)‘Packets) < BufferSize(source), 
CanSink := LAMBDA (dst:a.SinkID,cl:Class) : 


LET sink = (# proc:= dst‘proc, port:= dst‘target, cl := cl #) 
IN s‘state‘CanSink (sink) 


Our goal is to prove the following theorem, where o denotes function composition: 


networkSatisfiesabstract: THEOREM FORALL (c_history: c.History) : 
c.NetworkAuxProtocolAllows(c_history) IMPLIES 
a. AbstractNetworkSpec(MapState o c_history) 


We have split the proof into three cases. First, we prove that concrete initial states map 
into abstract initial states. Second, we prove that for every action taken in the concrete 
history there exists a corresponding abstract action taken in the abstract history. Third, 
we prove that the implementation satisfies the fairness requirements described by the 
abstract specfication. 

The proof of the two lemmas in makes use of additional invariants we have 
proved in theory correctness. 


networkSatisfiesabstractInit: LEMMA FORALL (s: StateAux): 
NetworkAuxInit(s) IMPLIES AbstractNetworkInit (MapState(s)) 


networkSatisfiesabstractNext: LEMMA FORALL (h: c.History, n:nat): 
c.NetworkNextAux(h(n) ,h(n+1)) IMPLIES 
a. AbstractNetworkNext (MapState(h(n)), MapState(h(n+1))) 


For instance, we needed the invariant saying that every packet in the concrete network 


has a distinct stamp (23). 
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23 
CounterIsUnique(s:c.StateAux): bool = 
FORALL (bi, b2:BufferID, index1:below[length (Packets (Buffers(s‘state)(b1)))], 
index2: below [length (Packets (Buffers(s‘state)(b2)))]): 
((b1 /= b2 OR indext /= index2) 
IMPLIES Buffers(state(s))(b1)‘Packets(index1) ‘pkt‘msg‘counter /= 
Buffers (state(s) ) (b2) ‘Packets (index2) ‘pkt ‘msg‘ counter 
) AND 
FORALL (i:Pid,d:Direction,k:below[s‘state‘PacketsIn(i) (d)‘length]): 
s ‘state ‘PacketsIn(i) (d) ‘seq(k)‘msg‘counter /= 
Buffers (state(s) )(b1) ‘Packets (index1) ‘pkt ‘msg‘ counter 


CounterIsUniqueInv: LEMMA FORALL (c_history: c.History) : 
c.NetworkProtocolAllows(h) IMPLIES 
Always (MapPredicate(CounterIsUnique, h)) (0) 
To prove [24 | we re-use the dependency graph and the theorem NoDeadlock2Inv 


which shows that there is no cycle in that graph. 


networkSatisfiesabstractFair: LEMMA FORALL (h: c.History): 
c.NetworkProtocolAllows(h) IMPLIES a.InternalActionsAreFair(h) 


Essentially, we prove that every packet (buffer entry) eventually does not depend 
on any other packet. This means that the fanout of any node in the dependency graph 
eventually becomes zero (25), and consequently allows the packet to move a step 
forward using fairness. 


Unblocked(ndi:VCNode): [State -> bool] = 
LAMBDA (s:State): NOT(EXISTS (nd2:VCNode): edges (DepGraph2(s)) (ndi,nd2)) 


every_edge_eventually_disappears : THEOREM 
FORALL (c_history: c.History) : 
¢.NetworkAuxProtocolAllows(c_history) IMPLIES 

FORALL (nd: VCNode): 

Always (Eventually (MapPredicate(Unblocked(nd), c_history))) (0) 

Then we use the fact that the distance of the packet to its destination decreases at each 
hop in order to prove that the packet eventually reaches its destination and is eventually 
delivered (by fairness of c. DeliverMessage). 

Again, in order to decompose the proof, we prove several lemmas. 
stability_lemma | 26 |) says that if an edge is permanent, at some point the path of 
permanent edges it depends on will saturate. The proof of this lemma uses the finiteness 


of the dependency graph and absence of cycles. 


stability_lemma : LEMMA FORALL (c_history:c.History, t:nat, nd0,ndi:VCNode): 

c.NetworkAuxProtecolAllows(c_history) 

AND Permanent ((nd0,nd1),c_history,t) 

IMPLIES EXISTS (t2:nat, nd2:VCNode, w:Walk(DepGraph2(c_history(t2)‘state))): 

from? (w,nd0O ,nd2) 
AND FORALL (i:below{1(w)]): Permanent ((w(i),w(i+1)),c_history,t2) 
AND FORALL (nd3:VCNode): (edges (DepGraph2(c_history(t2) ‘state)) (nd2,nd3) 
IMPLIES NOT(Permanent ((nd2,nd3) ,c_history,t2))) 


The next lemma (27) says that every individual edge that appears in the graph 
eventually disappears (though it may reappear again later). 
It follows from stability_lemma and fairness conditions. 


every_edge_eventually disappears : LEMMA 
FORALL (c_history: ¢.History) : 


NetworkAuxProtocolAllows(c_history) IMPLIES 
FORALL (c_t:nat, ndi,nd2: VCNode)}: Not (Permanent ((ndi,nd2),c_history,c_t)) 
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The next two lemmas (29) allow us to decouple I/O dependencies from normal 
dependencies. The first lemma says that if there is a dependency from nd1 to nd2 there 
should exist another dependency from nd1 to nd3 such that nd3 does not depend on any 
entry in a buffer within the same node (this can happen only between I/O buffers, by 
definition of DepGraph2). The second lemma says that the number of I/O dependency 
edges outgoing from a node decreases monotonically until it reaches zero, leaving only 
normal dependencies (that is, between buffers from different nodes). 


Permanent (e: edgetype [VCNode] ,c_history:c.History,t:nat) : bool = 
edges (DepGraph2(c_history(t) ‘state)) (e) 
AND FORALL (t2:nat): 
(t2 > t IMPLIES edges (DepGraph2(c_history(t2) ‘state))(e)) 


NoIQDep(ndi:VCNode): [State -> bool] : bool = LAMBDA (s:State): 
FORALL (nd2:VCNode): ( edges (DepGraph2(s)) (ndi,nd2) 
IMPLIES buff_id(nd1) ‘proc /= buff_id(nd2) ‘proc ) 


NoIO_OR_Unblocked(nd:VCNode): [State -> bool] = 
LAMBDA (s:State): Unblocked(nd)(s) OR NoIODep(nd)(s) 


The predicate Permanent ( 28 ) says whether an edge is permanent. NoI0Dep says 
that all the dependencies (edges) outgoing from a node are not I/O dependencies, that 
is, the entries they depend on are in a neighboring node. 


delete_IO_dep : LEMMA FORALL (c_history:c.History, t:nat): 
¢.NetworkAuxProtocolAllows(c_history) 
IMPLIES FORALL (ndi,nd2:VCNode): 
Permanent ((ndi,nd2) ,c_history,t) 
IMPLIES ( NoIQDep({nd2)(c_history(t)) 
OR EXISTS (nd3:VCNode): 
( Permanent ((nd2,nd3),c_history,t) 


AND NoIQDep(nd3) (c_history(t)) 
) 


) 
I0_disappears_monotonically : LEMMA FORALL (c_history: ¢.History): 
c.NetworkAuxProtocolAllows(c_history) IMPLIES 
FORALL (nd: VCNode): 
Always (Eventually (MapPredicate(NoI0_OR_Unblocked(nd), c_history))})(0) 


The lemma delete_I0_dep {29} follows from the definition of DepGraph2. The 


proof of I0_disappears_monotonically uses induction on the number of I/O edges 
outgoing from a node, and lemmas delete_I0_dep and [27]. 


6 Discussion 


This was an engineering effort, not research, so our lessons are about what’s cost effective 
rather than what’s possible. 

We've elided many details for the discussion, and also a major mode that made the 
spec somewhat more complex and added a third to the proof. We didn’t learn anything 
from those parts worth talking about. 

We didn’t verify many things we wanted to. We would have liked to verify that the 
spec was satisfiable for many configurations, that its fault tolerance worked under certain 
classes of failure, and that the coherence protocol plus additional pieces, layered on top of 
the network protocol, implemented the Alpha memory model. But those weren’t justified 
by a business need. 
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6.1 Experience with Different Verification Tools 


We tried three verification systems. We started with model checking, but even with 
SMV’s [McM99] scalar sets, induction, and other reductions, we couldn’t reduce the 
state below 250 bits, mostly because of induction in two dimensions. Fortunately, it 
took very little time to figure that out. (We did use SMV successfully on the coherence 
protocol.) 

We then used TLA+ [Lam99] and TLC [YML99]. TLA+ is a specification language 
based on the Temporal Logic of Actions. It uses set theory and is untyped. We started 
with English specs, a draft partial TLA+ spec (written by Josh Scheid), and access to 
the protocol architects and implementors, and we extended and then rewrote the spec 
over the course of several months. TLC is an explicit-state model checker for TLA+, 
and we used it to find many bugs in our spec, though because TLA+ is untyped it often 
took long runs to find bugs we considered type errors. We were beta-testers of TLC and 
found many bugs in it, but the developers fixed them so fast that it rapidly stabilized, 
and its bugs didn’t slow us down. TLC added multithreading during our project, which 
let us run it 5-10 times faster. Even so we couldn’t exhaustively cover any but the most 
extremely limited configurations. We had no other proof tools for TLA+. 

Finally we translated the TLA+ spec into higher-order logic for PVS [SOR93] and 
started the proof there. The translation was trivial, and PVS type checking immediately 
found many more bugs, and in fact it found almost all the remaining ones. We also found 
and reported a variety of bugs in PVS itself, and those were not fixed during our project. 
Though we found ways of working around all of them, if we’d been using an open-source 
tool it would have been worth our effort to fix the bugs ourselves. 

The organization of theories and files was surprisingly constrained by how PVS 
passes parameters to theories. What was shown in Figure 2 as the single “config” theory 
is in fact three theory files, so that some parts could be parameterized and reused. 


6.2 Thoughts about Cost-Effectiveness 


The design was largely debugged by simulation before we started. We could have found 
some of those bugs had we started earlier, but that wouldn’t actually have saved any 
money; our “concrete” model is much more abstract than RTL, so simulation would have 
been necessary anyway. In theory, early formal specs and proof might have prevented 
bugs and so shortened our schedule, but that’s entirely speculative. Our contribution was 
to verify what simulation couldn’t: liveness and fairness. 

The proof took the most effort (5 intense person months), but took less time than 
writing the specs (depending on how you count, 5-10 months of two people). Type 
checking found almost all the bugs in the spec. PVS type checking can require full 
interactive theorem proving, but in practice the type-checking proofs were automatic or 
pretty trivial. Formal specification and thinking through the formal proof in detail found 
the two bugs in the design. If the job had been to find bugs rather than to prove their 
absence, the mechanically checked proof would not have added much value — writing 
the formal specs, typechecking them, and thinking through the formal proof did most of 
it. 
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But we took on the verification to reduce risk; we didn’t expect to find protocol 
bugs. We found that risks are much smaller in hindsight — it’s harder to justify the effort 
after the fact, because risk reduction is unmeasurable. Partly because of that, we’re now 
verifying things that are important, hard for simulation, and where we’ll have lots of 
bugs to show for it afterward. 

PVS, like most tools, is designed to use only a single processor, and there’s an 
opportunity there. Even small groups now typically have lots of idle processors connec- 
ted to a local net, and like many big organizations, we have literally hundreds of 
high-performance processors locally available. We distribute jobs among them to help 
simulation-based verification, and would have devoted them to help with the proof if we 
could have. Are there ways to use networked computers to ease theorem proving? 

We can claim two bugs, neither in the protocol proper. The first was in the im- 
plementation rather than the protocol. The protocol designers had reversed a design 
decision repeatedly and the implementation had settled in an inconsistent state. We bro- 
ught the confusion to light while formalizing the spec, and coincidentally, simulation 
independently discovered it the same day. The second was an unrecognized configura- 
tion constraint in the protocol. A mode for routing around failures conflicted with a mode 
for improving performance, and though each worked separately, together they allowed 
deadlock. 


6.3 Thoughts about the Software Engineering of Formal Specs 


The specs aren’t large: 3 pages of library code, 6 pages of abstract spec, and 35 pages of 
network spec. The refinement map, deadlock-freedom property, liveness property, and 
74 lemmas were another 25 pages. The deadlock-freedom proof was about 20,000 lines, 
and the fairness proof about another 6,000. 

We used CVS for version control of the specs, but rather haphazardly. (CVS rather 
than another tool only because it was already being used for version control of the RTL.) 
Had there been more of us using the spec, version control would have been essential. 
Had there been more people doing the proof, version control of the proof would have 
been appropriate, too. 

The formal protocol spec needed and got a code review by designers. A good display 
form would have been nice but wasn’t necessary. Engineers and architects picked up the 
necessary higher-order logic, temporal logic, and obscure syntax without much effort. 
They could not have written in it without much practice, but understood it well enough 
to do very effective review. We should have reviewed more, and earlier. 

We repeatedly changed types and parameterization of theories and functions. We 
found that giving actions as few parameters as possible makes spec maintenance easiest. 
We completely de-parameterized the actions when we realized that. Unfortunately, we 
did so before writing the fairness spec and learning that anything quantified outside a 
fairness operator in the fairness spec must be an action parameter. So we had to put 
those parameters back in. There are still constants that should be parameters because we 
didn’t finish the changes. We would have benefited from software-engineering tools to 
help make such changes. Good browsing, outlining, and display tools would have been 
nice, too, both for us and during the code reviews. Had we been using SMV, we would 
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have brought a machine to the reviews to check invariants in real time as they came up 
in discussion. 
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1 Introduction 


Consider a statement about groups “For G a group, ...”. A naive approach to 
formalize this is to unfold the meaning of group, so that every statement about 
groups begins with 


For G a set, + an operation on G, + associative, eG G,... (1) 


This “unpackaged” approach can be improved by collecting all the parts of the 
meaning of group into a context, which need not be explicitly mentioned in 
every statement. A means of discharging some of the context is provided, so 
that statements made under that context can be instantiated with particular 
groups. However once the group context is discharged, all the parts of a group 
must be mentioned when using any general lemma about groups. Variations on 
this are supported by many proof tools, e.g. Coq’s Section mechanism [Coq99], 
Lego’s Discharge [LEG99), Automath contexts and Isabelle locales. 

A significant refinement is achieved by giving names to bits of context as 
in telescopes [dB91], or first-class contexts as in Martin-L6f’s framework with 
explicit substitutions (Tas97]. With these, we need not discharge a context to 
instantiate definitions and lemmas. But contexts or telescopes are “flat”; they 
don’t show that structures are built from existing structures, sharing some parts, 
and inheriting some properties. 

We informally define “packaging” as any approach to collecting the parts of 
mathematical structures, supporting more abstract manipulation of structures. 
Packaging is a two-edged sword: once structures are packaged to gain abstraction, 
we need more tools for manipulating them. 


Overview In Sect. 2 we show well-known inductively definable packaging con- 
structions, Sigma types and inductive records, pointing out some issues of ef- 
ficiency and abstraction that are not so well-known. Section 3 presents true 
records, including treatment of labels. This presentation is inspired by the work 
of Betarte and Tasistro [BT98,Bet98,Tas97], but considerably simplifies and ex- 
plains that work. One of our simplifications is removal of record subtyping from 
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the core description of records, in favor of the more general notion of coercive 
subtyping, which is briefly discussed in Sect. 4. 

Sect. 5, about making signatures more precise, discusses Pebble style sharing 
and manifest types in signatures. The main novelty of the present paper is a 
simple treatment of manifest types in signatures. The main idea I wish to sell is 
that manifest signatures are necessary for the actual (in practice) formalization 
of mathematics. 


Some notations I use [z:.A]M for lambda-abstraction, (z: A)B for dependent 
function type, (A)B for non-dependent function type and M(N) for application. 
[.: AJM is a lambda-abstraction whose variable is not used in the body. Field 
labels are written r, p, to distinguish them from variables r, p. 


Acknowledgement This paper owes much to discussion with my colleagues in the 
Computer Assisted Reasoning Group at Durham, especially Zhaohui Luo. 


2 Definable Structures 


We begin with some structuring techniques that are inductively definable in 
dependent type theory. These are first-class constructions, and as such can be 
parameterized using lambda-abstraction. 

I will use partial equivalence relation (PER) as a running example. Let sym 
(resp. trn) express that R is a symmetric (resp. transitive) relation over S. 
PER is the telescope (2) or the informal record type (3) 


[S:Set}[R: (S)(S)Prop\[sAz: sym(S)(R)][tAz: trn(S)(R)}, (2) 
(S:Set, R:(S)(S)Prop, sAxr: sym(S)(R), tAz: trn(S)(R)). (3) 


An object with the signature (or fitting in to the telescope) PER is informally 
(S=T, R=Q, sAr=symQ, tAr=trnQ). 


where T:Set, @:(T)(T)Prop, symQ: sym(T)(Q) and trnQ: trn(T)(Q). 


2.1 Sigma Types 


It is clear from (1) that any approach to packaging must handle dependency of 
later parts on earlier parts of the package. The simplest dependent package is 
pairs with Sigma types. For our purposes it is best to use a logical framework 
presentation of Sigma types (Fig. 1), where dependency is handled by the de- 
pendent function type of the framework (see rule FORM). While I have written 
the computation rules as typed equality, they can be implemented by syntactic 
reduction. 
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A: Type P : (A)Type SAP : Type a:A p: P(a) 
FoRM INTRO 
SAP : Type (a,p) cap: LAP 
p: SAP p: SAP 
EuiM1 EuimM2 ————_—— 
p1l:A p.2: P(p.1) 


(a, p)sap : SAP 
ComP] ——_____ 
{a,p)vap.l=a:A 


(a,p)sap : LAP 
Comp2 ————_______— 
(a,p)sap.2 =p: P(a) 


Fig. 1. LF presentation of Sigma types. 


For type synthesis to be effective, the pairs, (a, p)s 4p, must be heavily typed, 
ie. carry the annotation YAP. But then the type of the first component can 
be inferred, so I will write Y’P and (a,p)yp. 

Sigma can be formalized in Coq (and similarly in Lego) as an inductively 
defined family with a single constructor: 


Inductive sigT [A:Type; P:A->Type]: Type := 
existT: (x:A)(P x) -> (sigT AP). 


The constructor of (sigT A P) is (existT A P); i.e. the pairs are heavily typed, 
as in Fig. 1. The two projections are defined in terms of the inductive elimination 
rule (i.e. by case analysis); e.g. the first projection 


projTi [A:Type; P:A->Type; H:(sigT A P)]: A := 
Cases H of (existT x _) => x end. 


Unlike Fig. 1, definable projection is heavily typed, but that is an artifact of this 
functional presentation. 


Example: right association We can represent PER by associating pairs to 
the right. This is usually written out directly 

PER := XS: Set|Z[R: (S)(S)Prop]Z|_: sym(S)(R)]trn(S)(R). (4) 
To clarify, this definition can be written incrementally 

Inner [S:Set][R:(S)(S)Prop] := L|_: sym(S)(R)]trn(S)(R) 


Middle [S: Set] = ZR: (S)(S)Prop|Inner(S)(R), (5) 
PER 2[S: Set] Middle(S). 


Inside the signature, the fields are named and referred to by bound variables, S 
and R. However, these names are purely local; to refer to the fields of a PER 


! For example, an un-annotated pair, (a,p), inhabits both YAP and Ax P(a@), which 
are not equal types. 
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object we must use the anonymous first and second projections. To help matters 
we can give names to field projectors. (I use informal dot notation for application 
of these defined projectors, although they are actually functions.) 


S(P: PER] : Set <= Pl 

R(P: PER] : (P.S)(P.S)Prop := P.2.1 
sAz [P: PER] : sym(P.S)(P.R) := P.2.2.1 
tAz [P: PER] : trn(P.S)(P.R) := P.2.2.2 


These defined field projectors are global, and cannot be reused globally (e.g. as 
projectors in other packages) without shadowing or other ad hoc solution. 
Here is an inhabitant of the signature defined in (4) 


(T, (Q, (symQ, trnQ) mner(T)(Q)) Middle(T)) PER: (6) 


Example: left association An alternative is to associate pairs to the left. 


Rel := LS: Set](S)(S)Prop, 
symRel := XP: Rellsym(P.1)(P.2), (7) 
PER := Z|P: symReljtrn(P.1.1)(P.1.2). 


We do not get to use the local field names directly when defining this signature, 
but have to project the fields, e.g. P.1.1. As above, we can define top level names 
for field projectors of this tuple: 


S|[P: PER) : Set = P1111 
tAg [P: PER] : trn(P.S)(P.R) := P.2 
Here is an inhabitant of the signature defined in (7) 
((( as Q) Rel, symQ) eymRels trnQ) PER- (8) 


Ad hoc Association There are two other ways to build a 4-tuple out of pairs, 
and using Sigma types we are free to use any ad hoc association we choose. 
However, it seems that unitary rules for building records, to be discussed below, 
must choose left or right association uniformly. 


Some Differences By giving both right and left associating definitions of PER 
incrementally I have emphasised the duality of these constructions. However this 
“symmetry” is broken because type dependency increases to the right. Here are 
three consequences. 

The nested pairs of (6) and (8) are heavily typed, and there is duplication of 
type information in both cases, so an implementation may optimize structures by 
keeping only the outermost type annotation.” The inner annotations of (8) are 


? Lego does this with built-in Sigma types. 
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subterms of outer annotations, so projection from left associating structures is 
cost-free using this optimization. However in (6), the inner annotations are sub- 
stitution instances of outer annotations, so this optimization requires traversing 
types in order to project the second component. 

If we want to add another field to a structure, (e.g. to make PER into 
equivalence relation), it must be added on the right, as it will, in general, depend 
on all the previous fields. Thus (7) can be extended directly and the structure 
will remain left associating and have PER as an immediate substructure. This 
property is called extensibility. On the other hand, extending (4) will either entail 
breaking the right-association or reorganizing the entire structure so that PER 
is no longer a substructure. 

Suppose we want to specialize PER to the natural numbers. The data of 
definition (5) shows directly how the rest of the package depends on the first field; 
e.g. Middle(nat) is the structure we want. Such “application” of parameterized 
signatures is known as Pebble style sharing, and will be discussed further in 
Sect. 5. There is no obvious way to specialize definition (7) without reorganizing 
the entire structure. This point is evidently dual to the previous paragraph. 
The observation that “application” for sharing can be defined directly on nested 
Sigma types appears in [Kam99], without clear understanding that the tuples 
must be right associating. 


2.2 Inductive Telescopes 


The inductive definition of Sigma types as 2-tuples, with projections program- 
med in terms of inductive elimination, can be extended to arbitrary telescopes. 
If 

T = [x,: Ay|[r9: Ao(x1)) -- > [ze: Ag (21,.--,2e—1)] 


is a telescope, then there is an inductively defined type YT, with a single con- 
structor, given by the formation and introduction rules 


A, : Type 
Ag : (A;)Type 
Form : (9) 
Ap : (21: Ay; Ze: Ay(x1); ...; Ag—1(@1,--.,2k-2))Type 
SIT : Type 
XT: Type ay:A, @2:Ag(ay) --- ap: Ap(ai,-..,Gn—1) 
INtRo =H] YY7Y>YNN (10) 


(Q1,..-,@n) oT . oT 


The premises of (9) say exactly that T is well-typed, and the premises of (10) 
say that (a),...,a@,) fits in to T. In Cog (and similarly in Lego) there is special 
syntax for inductive telescopes. For example, one can define PER as 


Record PER: Type := 
{ S:Set; R:S->S->Prop; Sym:(sym S R); Trn:(trn S R) }. 
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which generates an inductive definition 


Inductive PER : Type := 
Build_PER : (S:Set; R:S->S->Prop) (trn S R)->(sym S R)->PER. 


having one constructor, Build_PER. The field names, S, R, ..., are bound varia- 
bles. To access the fields from outside the package, one must use the anonymous 
eliminator Cases. For example, S is defined (approximately) as 


S [x:PER]: Set := Cases x of (Build_PERS R__) = S. 


Coq tries to automatically define these projectors, and name them with the 
associated bound variable name, but this will fail if that name is already used 
at top-level. 


Uniform Packages For every k we can define the general telescope of length k. 
3; (Sect. 2.1) is the general 2-telescope. The general 4-telescope is programmed 
in Coq as 


Record Sig4 [A1: Type; 
A2: Al->Type; 
A3: (ai:A1)(A2 ai)->Type; 
A4: (ai:A1; a2:(A2 ai))(A3 al a2)->Type] : Type := 
{ ai:Ai; a2:(A2 ai); a3:(A3 al a2); a4:(A4 al a2 a3) }. 


This definition exactly captures rules, (9) and (10). PER can be defined as 


Definition PER’: Type := 
(Sig4 Set 
[S:Set] (S->S->Prop) 
[S:Set; R:S->S->Prop] (sym $ R) 
[S:Set; R:S->S->Prop; _:(sym S R)](trn S R)). 


It is also possible to define general telescopes in terms of shorter general teles- 
copes in various ways, as we defined PER in several ways in terms of pairs. 


Some Differences. PER above is not heavily typed, since Build_PER can only 
construct a PER. E.g. (Build_PER T Q symQ trnQ) inhabits PER. On the other 
hand, PER’ is heavily typed, and its constructor must be given type annotations. 

Name equality us structure equality. Due to the way Coq and Lego create 
inductive types, PER is different from any other type, so can be made abstract 
by hiding its constructor, as is done in programming language module systems. 
However PER’ is definitionally equal to any type that is Sig4 applied to argu- 
ments that convert with those in the definition of PER’. To make PER’ abstract 
would require hiding the $ig4 constructor. Similarly, any structure defined by 
specializing a more general package admits extra type equalities. 
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L: Type A: (L)Type (L, r: A) : Type [:L a: A(t) 
Form ——-—____—_ INTRO HH. 
(L, r: A) : Type (l, r=a) : (L, r: A) 
Elimination: projection and restriction of the visible label. 
L: (DL, r: A) b: (DL, r: A) 
EL-TR ————— EL-TP ———— 
dr: L Lr: A(lir) 
Elimination: passing the operations below top level. 

L: (DL, r: A) (l|r)|p:P L: (LD, r:A) (l\r).p:P 
EL-LR —————~——_r pp EL-Lp ———————————_ r # p 
ilp:P lp: P 

Computation rules. 
(l, r=a) : (L, r: A) (i, r=a): (L,rA 
C-TR ———__ C-TP Ae: 
(i, r=a)fr=t: LZ (I, r=a).r=a: A(l) 
i: (L, r: A) (|n|p: P i: (L, r: A) (i|r).p:P 
C-LR ——_____—_—_——_-r Fp C-Lp ————_——______—-r¥ p 
(linip = Up: P (iin).p= lp: P 


Fig. 2. Rules for left associating records. 


3 Dependently Typed Records 


Betarte and Tasistro [BT98,Bet98,Tas97] give an extension of Martin-Léf’s fra- 
mework to extensible, dependently typed records, with record subtyping. They 
had a prototype implementation, and worked interesting examples. However 
their presentation, in a complicated framework, with apparently essential use 
of subtyping, makes their system hard to understand. Here I give a simplified 
version of their system, without subtyping or explicit substitutions, which turns 
out to be straightforward left associating records. (I am informal about the dif- 
ference between Type and Set.) Record subtyping will be treated orthogonally 
(using coercive subtyping) in Sect. 4. I also present right associating records, 
which are reasonable too. 


3.1 Left Associating Records 


A left associating record is a pair, with a labelled second component. It responds 
only to its own label, but passes any other label on to its first component. 
Record types (signatures) have syntax (L,r: A). Records (structures) have syntax 
(l, r=a@) pz, +: a)- I will henceforth omit type annotations on records, which are, 
however, necessary in this setting for type synthesis to be effective. The rules 
are shown in Fig. 2. 

Labels do not bind, either in signatures or in records. In the notation (1, r=a), 
“r—@” is only syntax, not a local definition. It is more precise to say that the 
label is part of the signature, hence of the type annotation of a record. 
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Formation and Introduction In rules FORM and INTRO, L may be a record 
type, and may have a field r. Later fields shadow earlier fields with the same 
label. Allowing repeated labels in signatures is not completely satisfactory, but 
simplifies the presentation, while allowing signatures to be first-class.? Betarte 
and Tasistro use a kind, record-type, to enforce that if p: record-type is known, 
then all the labels of p are known, so side conditions about freshness of labels 
make sense. Our presentation could be modified to enforce fresh labels in a 
similar way. 

Similarly, for us no empty record type is required; an informal notion of 
“pure record type” is obtained by starting from the unit type (which I write as 
(), although it is not a record). 


Elimination and Computation There are two record eliminators, projection “I.r” 
and restriction “I|r”. The side condition r 4 p is needed to force the top-level 
rules to be used whenever possible, as r may be a field in /|r. In rules EL-LR 
(resp. EL-LP) we don’t know that L is a record type, but /|p (resp. l.p) is only 
well typed if we already know that (I|r)|p (resp. (i|r).p) is well typed, which can 
only happen if Z is (equal to) a record type with a field labelled p. 

Betarte and Tasistro use record subtyping to explain projection: if 1: L, and 
r:A isa field in L, they conclude J.r : A(l). That is, the only use A can make of 
l is to project it at some fields that precede r in L, and | has at least those fields 
that precede r. Our restriction operation, which is not explicit for Betarte and 
Tasistro, allows us to write the elimination rules (e.g. EL-TP) prior to a notion 
of record subtyping. 

The computation rules are as expected, given the elimination rules. I have 
written them as typed equality. While rules C-TR and C-TP can be implemented 
by syntactic reduction, rules C-LR and C-LP apparently require run-time type 
checking because of their second premises. 


Records vs Sigma It is clear from our presentation that left associating signatures 
are just Sigma types carrying labels: erase “r” everywhere and read “|” and “.” 
as “.1” and “.2” respectively. Thus they can be nested however you wish; we call 
them left associating because rule EL-LP searches from right to left. They could 
have an anonymous restriction operator (just drop rule EL-LR), but whenever 


we know that some object has record type, we already know its label. 


Example: PER in left associating records In “official” syntax we write 


set = ({), S:[-: (Set) 
Rel := (set, R:[z: set](x.S)(a.S)Prop) 
symRel := (Rel, sAx: (x: Rel] sym(x.S)(x.R)) 
PER := (symRel, tAx: [x: symRellirn(x.S)(x.R)) 


Binding of variables for local field names is treated the same as in (7), although 
here, because of rules C-LR and C-LP, the depth of projections does not increase 


3 E.g. [L: Type, A:(L)Type](L, r: A) is well-typed, although I may have label r. 
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A: Type L: (A)Type {r: A, L} : Type a:A L: L(a) 
ForM ——-—___—_ INTRO —__-—_____ 
{r: A, L}: Type {r=a, l}: {r: A, L} 
l: {r: A, D} L: {r: A, L 
EL-TR ——-——— EL-TP ae 
Ur: L(Lr) br: A 
Signature abstraction: 
{r: A, EL}: Type {r: A, L} : Type 
$-TR ——————_——_- S-TP ——_—_—_ 
{r: A, L}| = L : (A)Type {r: A, L}. = A: Type 


Fig. 3. Some rules for right associating records. 


(“trn(x.S)(@.R)” instead of “trn(P.1.1)(P.1.2)”). The field labels are new, but 
they do not interact with local names. This structure could be defined with three 
nested pairs, as in (7), but then the first field would not have a label; it would 
still be accessible as |R. 

Writing this signature without intermediate definitions, and removing inner 
brackets and redundant type tags, we get a nearly acceptable notation: 


PER := (S:[_|Set, R:(z](x.S)(x.S)Prop, sAx: [x]sym(a.S)(z.R), (11) 
tAx: [x]trn(a.S)(z.R)). 


An official inhabitant of (11) is ((((«, S=T), R=Q), sAx= symQ), tAx= trnQ), 
where * : (). Omitting redundant information, we write 


(S=T, R=Q, sAx= symQ, tAx= trnQ). (12) 


3.2 Right Associating Records 


I presented left associating records first, as they are close to the work of Betarte 
and Tasistro. However, from the viewpoint that records are just Sigma types 
with labels, right associating records are also natural. I guess that Betarte and 
Tasistro chose left association to achieve extensibility. 

A right associating record is a pair, with a labelled first component. It res- 
ponds only to its own label, but passes any other label on to its second compo- 
nent. Record types have syntax {r: A, L}. Records have syntax {r=a, !} 4, 4,1} -1 
will henceforth omit type annotations on records, which are, however, necessary 
for type synthesis to be effective. Again, “r=a” is not a local definition: r does 
not bind in U. 

The rules are analogous to left associating records of Sect. 3.1, so I only 
present some of them in Fig. 3. 


Signature abstraction We can define operations of projection and restriction on 
signatures (rules S-TR and S-TP of Fig. 3). Restriction, {r: A, Z}|, takes a closed 
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package, and shows how it functionally depends on its first field. This operation 
supports the signature application of [Kam99], and hence Pebble-style sharing 
(Sects. 1 and 5). This is the main reason to be interested in right associating 
records. 


Example: PER in right associating records In “official” syntax we write 


PER := {S:Set, [s:Set] 
{R: (s)(s)Prop, [r:(s)(s ua 
re sym(s)(r), [-: sym(s)(r)] 
{tAx: trn(s)(r), [-: trn(s)(r)]()} FFF. 


Removing inferable type annotations and internal brackets, we can write 
PER := {S: Set, [s|R: (s)(s)Prop, [r]sAx: sym(s)(r), [-]tAx: trn(s)(r)}. (13) 


Binding of variables for local names is treated the same as in (4). An official 
inhabitant of (13) is {S=T, {R=Q, {sAx= symQ, {tAx= trn@, x}}}}. Omitting 
redundant information, we write {S=T, R=Q, sAx=symQ, tAx=itrnQ}. 


3.3. Labels and Variables 


Labels are the global accessors for records, and hence cannot be alpha converted. 
Thus labels and variables cannot be the same syntactic class in dependent record 
types, or else how could we substitute y for x in (y:Set, z:xz). To my knowledge, 
the first satisfactory handling of this problem is in [HL94], where every field has 
both a label (that does not bind) and a variable (that does bind). PER would 
be written as 


(Sp s:Set, Rpor:(s)(s)Prop, sAxp _: sym(s)(r), tAx>_:trn{s)(r)). (14) 


The approaches I give above, inspired by [BT98], are more parsimonious than 
[HL94] by using the existing dependent function type instead of introducing a 
new binding construct. Nonetheless, all three notations with labels and variables 
(i.e. (11), (13) and (14)) are heavy. Betarte and Tasistro point out that an 
infoemal notation, e.g. 


{S: Set, R:(S)(S)Prop, sAx: sym(S)(R), tAx: trn(S)(R)), (15) 


can be translated mechanically to the formal notations. This is true, but some 
signatures cannot be represented in this informal manner, as an example from 
[HL94] shows: 


left associating — (b: [_: (} Type, c: [x] (b: [-: (Type, f: [y](y.b)a.b)), 
right associating {b: Type, [z: Type]{b: Type, [y: Typel]f: (y)x}}, 
Harper/Lillibridge (bp a2:Type, cp _: (bp y: Type, fp _:(y)z)). 
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Here, the variables x, y, are needed to disambiguate the two labels b. Thus it 
seems unlikely that a notation such as (15) could be a satisfactory formalization. 
Nonetheless, some well known proof tools (e.g. PVS) present dependent record 
types in this notation. 

In the three formal approaches, left- and right- associating records, and Har- 
per /Lillibridge style, labels and variables are distinct syntactic classes. But there 
can be no confusion, even if they are implemented with the same concrete type, 
as labels and variables only appear in different syntactic positions. 


3.4 Dependent Records? 


Both Harper/Lillibridge and Courant [Cou99] present structures that are depen- 
dent as well as dependently typed; i.e., field variables can be used in later fields as 
local definitions. For example, one could write (n> z=3, mp .=2z).* However, 
this record is definitionally equal to (n> c=3, mp _=3), so no judgement can 
distinguish between the dependent presentation and the flattened one. 

It is evident from rules INTRO of Figs. 2 and 3 that fields in my left and 
right associating records cannot depend on previous fields. But my records 
are equally expressive: I write (n=3, m=3) at the same type as the exam- 
ple above. To preserve sharing, I might take (n= 3, m=n) as syntactic sugar for 
letn = 3 in (n=n, m=n), where n is a local variable that must be sufficiently 
fresh. Nonetheless, the dependency in Harper/Lillibridge structures does seem 
useful, even if I cannot say precisely what the difference is. I guess it is relevant 
here to mention that Harper/Lillibridge structures are not extensible: once a 
structure is closed there is no “potential” future use of its field variables. 


4 Coercions 


Presentations such as [HL94,BT98,Cou99] have a built-in notion of structural 
subtyping: roughly, any well-typed permutation of the fields of an extension of a 
signature is a subtype of that signature. Further, one allows corresponding fields 
themselves to be in the subtype relation. Whenever an object of a certain type is 
required, an object of a subtype is accepted: e.g. you can use a group wherever a 
monoid is expected. Below, I show how this approach is limited when structures 
are not “flat”, but are constructed from substructures, as is desirable in practice. 

Peter Aczel [Acz94] suggested a notion of coercive subtyping for type theory. 
Both Coq and Lego now support ad hoc, but very useful, notions of coercive 
subtyping [Sai97,Bai98]. Zhaohui Luo and colleagues {Luo99,L599] have studied 
coercive subtyping foundationally, as an abbreviational mechanism which intro- 
duces definitional equalities at a logical framework level. This approach is more 
expressive in some ways than the implementations, but in other respects does 
not yet equal them. Also, we do not claim that all the usability and implemen- 
tability issues of the logical framework (typed equality judgement, universes 4 
la Tarski, etc.) are worked out. 


* This style requires field variables in structures as well as in signatures. 
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As an example of coercive subtyping, reconsider equation (5), using coercions 
in Cog notation. Define Inner as a parameterized record, and use it as a field 
in a structure for PER 


Record Inner [S:Set; R:S->S->Prop]: Type := 
{ Sym: (sym S R); Trn: (trn S R) }. 
Record PER: Type := { S:>Set; R:>S->S->Prop; i:>(Inner S R) }. 


The notation $:>Set means that S:PER->Set is treated as a coercion from PER 
to Set, so that if p:PER we can write x:p to mean x ranging over the carrier of 
p, i.e. x:(S p), with the typechecker invisibly inserting the coercion S. Similarly 
R: (p: PER) p->p->Prop is a coercion from p:PER to relations over the carrier of 
p. (We can write p instead of (S p) in the type of R.) The statement about a 
particular p:PER that its relation is reflexive can be written as (x:p)(p x x). 

Similarly i:>(Inner S R) means that i:(p:PER)(Inner (S p) (R p)) 
(which can be written as i: (p: PER) (Inner p)) is treated as a coercion from PER 
to Inner, so that whenever p:PER, the field projections Sym and Trn of Inner 
can be applied directly to p, with the typechecker invisibly inserting the coercion 
i. Thus PER appears to be a subtype of Inner .® 

Structural subtyping cannot show PER as a subtype of Inner, because PER 
is not a permutation of an extension of Inner. In practice we often build new 
structures out of existing structures in this way (e.g. ringSig in Sect. 5). Struc- 
tural subtyping depends on accidents of structure, and does not support natural 
mathematical definitions. 

In this example the coercion i opens the Inner structure for users of the 
structure PER. This opening is transitive, in the sense that if we use PER in a 
larger structure, e.g. 


Record ER: Type := { p:>PER; Rf1:(rfl p) }. 


then we can still project the fields of Inner from an ER object. This also shows 
that extensibility is not very important, given coercive subtyping. 

In practice we want renaming to support opening substructures with common 
labels; e.g. the additive and multiplicative monoids of a ring. 


Subtyping Sigma Types Since records are constructed like nested Sigma 
types, it is useful to ask how subtyping propagates through Sigma. Suppose L 
is a subtype of L’ by coercion c: (L)L’, and A: (L’)Type. c can be lifted to 
a coercion cy : (27 £ Aoc)2’ L' A showing that XL Aoc is a subtype of 1 L’ A, 
defined by ey((l,a)) := (c(2),a) [Luo99]. Note how the composition, Aoc, is 
needed for the subtype to be well formed, as A expects an L’-object, not an 
L-object. This idiom will appear again in Sect. 5.3. 


° The illusion is not complete, as Inner is parameterized, and the projections must be 
applied to the parameters: (Irn (S p) (R p) p). However, with implicit arguments, 
we can write (Trn p). 
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5 Manifest Signatures 


Programming language designers have long recognized the need to see through 
the signatures of modules; [LB88,Mac86,Ler94,HL94] give a taste of the relevant 
literature. I assume the reader is familiar with the two basic approaches, Pebble 
style sharing which uses pure abstraction and application, and manifest types in 
signatures, which requires some new explanation. (I use the term manifest types, 
from [Ler94], informally, not referring to particulars of that paper.) 


Expressive Theorems With any of the packages above, the lemma that the 
dual ofa PER is alsoa PER could be stated as (PER)PER, but this formulation 
is too coarse to express our meaning. E.g. the identity function is also a proof of 
(PER)PER but is not the duality construction. 

We have seen the application of a parameterized signature (Pebbile-style sha- 
ring) in Sect. 1. This is convenient for right associating structures [Kam99]. Using 
(5) as the definition of PER, the duality theorem can be stated as 


(p: PER) Inner(p.S)(dual(p.S)(p.R)). 


This shows the duality explicitly, but doesn’t actually return a PER, so opera- 
tions on the PER package cannot be applied to the structure returned. E.g. we 
cannot use this theorem directly to prove that dualization is involutive. 

A manifest signature expresses the intended meaning 


(p: PER)(S= p.S: Set, R= dual(S)(p.R): (S)(S)Prop, sAz: sym(S)(R), ...) (16) 


but forces us to rewrite the definition of PER, which is error prone and obscures 
the statement. (Since S is manifest in (16), we use S in place of its value in 
succeeding fields.) 

What is needed is something like the with notation [Ler94] to add information 
about a signature. The duality lemma could be stated as 


(p: PER)PER with S=p.S with R= dual(S)(p.R). (17) 


This is often considered syntactic sugar for (16), but there are some details to 
make precise (Sect. 6). 

This example shows that ordinary theorems involve functions from structures 
to structures. Thus I am interested in first-class records with manifest types, and 
guess that stratified module systems [Cou99] will be inadequate in practice. 


Sharing Suppose monSig is the signature of monoids, and grpSig is the signa- 
ture of groups. How can we define ringSig as an (additive) group and a (multipli- 
cative) monoid, sharing the same carrier set, and having some axioms connecting 
the two operations? We might be satisfied with Pebble-style sharing, “applying” 
monSig to the carrier of the group. Although the objection still arises that in 
this approach no multiplicative monoid actually occurs as a substructure of a 
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SLA : Type a: (1: L)A(l) WL Aa : Type i: 3b 
FORM INTRO ——H——_ 
WL Aa : Type (wraa: WLAa 
Computation rules. 
()wLAaa : WAa 1:WLAa 
Compl ——-_ Come2. ———_- 
(wraal=l:L 1.2 =a(l.1): A(i.1) 


Fig. 4. Some rules for manifest left associating Sigma. 


ring, attempts to formalize some algebra in this way have made interesting pro- 
gress [Pot99]. Nonetheless, Pebble-style sharing is heavy, and doesn’t scale up in 
practice. What is needed is the with notation, allowing to write 


ringSig := (G: grpSig, M: monSig with crr=G.crr, ...). 


5.1 Definable Manifest Signatures 


Left Associating Using single constructor inductive definitions we can add a 
value specification to the right field of a Sigma type (Fig. 4). We call these left 
associating only because we intend to use them that way, as the example below 
shows. Aa doesn’t say what the value of the second field is, but constrains it 
uniformly as a function of the first field. As before, I will write WAa instead of 
WLAa, and even more informally, (/), for (waa. 

In Cog notation this type is defined 


Record Psi [L:Type; A:L->Type; a:(1:L)(A 1)]: Type := { psil:L }. 


The first projection, rule ComP1, is defined by inductive elimination as before, 
but the second projection, rule COMP2, is defined using the heavy type annota- 
tions 


Definition psi2 [L:Type; A:L->Type; a:(1:L)(A 1); h: (Psi ?? a)] 
(A (psil 7??? h)) := (a (psil 77? h)). 


This strong rule shows that WAa has its second field manifest. 

From INTRO it is clear that YAa is isomorphic with L. From Form, it is 
clear that WAa is a “subtype” of 2A; there is a definable coercion that makes 
this subtyping implicit in Cog 


Definition Psi_sigT [L:Type; A:L->Type; a:(1:L)(A 1); h:(Psi ? A a)] 
(sigT ? A) := (existT ?? (psil 77? h) (psi2 77? h)). 
Coercion Psi_sigT: Psi >-> sigT. 
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Example Surprisingly, although every WAa is uniformly typed (i.e. WAa : Type 
implies A: (L)Type), it is possible to construct non-uniform signatures. For ex- 
ample, although the informal signature (K: Type, A:K, @: A) is not well-typed 
(e.g. if K= (Set)Set: Type), the manifest signature (K=Set:Type, A: K, a: A) 
is well-typed by the construction 


Li := &([-:(|Type)((-: ()JSet) representing ((), K=Set: Type), 
£2 := Xx: L1|x.2 representing (L1, A: K), 
LS := 2.|¢2.2|\2.2 representing (L2, a: A). 


By rule Comp2, the occurrence of z.2 in the definition of D2 has value Set. 
L3 is inhabited by the tuple (((*)/_:(jset, mat), 0). 


Other Definable Manifest Telescopes The same trick can be used to make 
fields manifest in any inductive telescope, and in practice this could be more 
convenient than building structures out of nested 37 and W. However, as for 
W, such manifest telescopes must be well-typed uniformly, so to construct non- 
uniform structures such as the last example, we must nest manifest telescopes. 
In this sense W is the most general manifest telescope. 


5.2 Subtyping Manifest Signatures 


In the motivating example of PER duality, in order to view theorem (16) as 
returning a PER, I need a coercion that erases some manifest type information, 
so that the type of the structure returned in (16) can be seen as a subtype of 
PER. We have just seen the coercion Psi_sigT that does this for W, returning 
the corresponding +’. However there can be no coercion that allows viewing 
(K=Set:Type, A:K, a:A) as a subtype of (K:Type, A: K, a: A), since the 
latter is not even well typed. The usual rule for subtyping manifest signatures 
(e.g. [HL94,Cou99]) has a premise requiring the (less manifest) supertype to be 
well typed. 

In my presentation, the coercion forgetting a particular manifest field is con- 
structed by applying Psi_sigT at that field (which actually forgets the manifest 
value, and is always well typed), then successively lifting the coercion through 
the following fields using cy (Sect. 5). The typing of cy checks at each stage 
that the supertype is well typed, and this approach can be seen as analysing the 
usual rule into two unitary rules Psi_sigT and cs. 


5.3 Manifest Left Associating Record Types 


Just as we added labels to »’ in Sect. 3.1, so we add labels to W. Manifest 
signatures have syntax (LZ, r=a:A). r is opaque in (DL, r: A), and manifest 
in (L, r=a: A). Objects inhabiting manifest signatures officially have syntax 
(l) (pb, r=; A), but I will write them (1, r= a(l)) .© Some rules are given in Fig. 5. 


® An implementation might take this abuse of notation seriously, i.e. internalize the 
coercion ((L, r= a: A))(Z, r: A). 
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L,r: A): Type a: (U:L)A(l) L,r=a:A):Type [:L 
MForRM Ee ee ee) MINTRO a eee 
(L, r= a: A) : Type (l, r=a(l)) : (L, r=a: A) 
L: (L, r=a: A) 
Manifest computation. C-MA 


N — 
ir = a(iir) : A(ir) 


Fig. 5. Some rules for manifest left associating records. 


By inversion of MForM, whenever (L, r= a: A) is well typed, so is (L, r: A). 
In general, however, well typedness of ((L,r=a:A), s:B) does not guarantee well 
typedness of (([,r:A),s:B), as the typing of B in the former may depend on 
the value of r. 

(L, r= a: A) is more informative than (Z, r: A), so the elimination and com- 
putation rules of Sect. 3.1 also hold for (Z, r= a: A). We can define the coercion 
((L, r=a: A))(L, r: A) (Psi_sigT in Sect. 5.1) from the projection and restric- 
tion operations of (L, r=a:A). Thus |: (£, r=a: A) can be used as if it had 
type (L, r: A). 

There is a new computation rule for the projection of a manifest field, C-MAN. 
It is admissible that rule C-MAN goes underneath the top constructor. For ex- 
ample, suppose | : (L, p= b: B,r: A); we have 


L: (L, p= 6: B,r: A) " 1: (L, p= 6: B,r: A) L: (L, p= 6: B,r: A) 
Ur: (L, p= 6: BY 5 F esr OF: 
-MAN : : 
(in. = Kini): BCNIe) (iIt).p = Lp (Ire = Up 


Lp = b(I|p) : B(L|p) 


The with Notation “with” is explained in terms of manifest fields (Fig. 6). 
(L, r:. A) with r=a is defined as (L, r=a:A) whenever the latter is well-typed 
(rule WITH-DEF). Having explained how to apply with to some record, L, we 
explain how to apply it to a longer record (rules WITH-RO and WITH-RM). 
Moving with to the right in a signature loses information: the typing of B in 
(L with r=a, p: B) may use that r=a, while in (L, p: B) with r=a it may not. 

Since LZ with r=a is constructed by adding manifest information to a well 
typed signature, L, the difficulty of forgetting manifest information (Sect. 5.2) 
does not arise. There is a uniform coercion uf : (LZ with r=a)L, which must be 
defined mutually with with, as it appears in the conclusions of rules WITH-RO 
and WITH-RM. The reason for this coercion in these rules is the lifting of sub- 
typing through a signature, introduced in Sect. 5. The uniform definition of u* 
shows that weth-signatures are well behaved with respect to subtyping. E.g. (17) 
obviously returns a subtype of PER, while this fact about (16) requires checking. 
(The check is the computation showing (17) and (16) to be definitionally equal.) 
Conversely, (K=Set:Type, A: K, a: A) is not definitionally equal to any opaque 
signature followed by withs. 
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(L, r=a: A) : Type 


WITH-DEF ———_—_ 
(L, 1: A) with r=a = (L, r=a: A) : Type 


L with r=a: Type (L, p: B) : Type 
WITH-RO — _—___________________ p#r 
(L, p: B) with r=a = (L with r=a, p: Bou,’) : Type 


L with r=a: Type (L, p= b: B) : Type 
WT 8-RM J —___—___________________p¢r 
(L, p= 6: B) with r=a = (L with r=a, p= bou,’: Bou,’) : Type 


Fig. 6. Rules for with. wu! : (L with r=a)L is the coercion forgetting the manifest 
information r=a. 


6 Ongoing Work and Conclusions 


I have not directly developed the meta-theory of this approach, but manifest 
signatures can be expressed (i.e. programmed) in a logical framework such as 
that of Martin-Lof or Luo [Luo99] extended with inductive-recursive definition 
[Dyb97]. Thus, if the extended framework has good properties, as is informally 
believed, the system with manifest signatures preserves these properties. Here is 
an outline of this coding. Let /bi be a type having decidable equality, to be used 
for labels. sign : Type and recd : (sign)Type are defined by induction-recursion. 
sign has introduction rules 


s: sign r:lbl A: (recd(s))Type 
Unir ———— OPraQ 
() : sign (s, r: A): sign 

s: sign r:lbl A: (recd(s))Type a: (l:recd(s))A(l) 


(s, r= a: A): sign 


MAN 


The actual types, computed by recd, are 1’ types and W types: 
recd(()) = () recd({{s, r:A)) =A recd({s, r=a: A))=WAa. 


Projection, restriction and with are programmed using sign-elimination. 

It remains to experiment with this encoding, and develop its theory relative to 
the extended framework. This representation has first class labels with testable 
equality, which suggests possibilities for programmable renaming and opening. 


Conclusions I have shown surprisingly simple first-class dependently typed 
records with manifest types. The manifest types are extended by a simple with 
notation. They have no built-in notion of subtyping, but coercive subtyping 
gives them a more flexible notion than the usual structural record subtyping. 
I have not developed any meta-theory, but these records are “interpretable” 
in type theory with inductive-recursive definitions. These records are suggested 
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for formalizing mathematical structures, not necessarily as modules for separate 
checking or proof libraries. Further work remains on several important aspects 
of usability, such as efficiency and renaming of fields. 
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Abstract. The semantics of the object-oriented, multi-threaded langu- 
age Java is informally described in the Java Specification Book [5] where 
the memory model for concurrent threads is explained abstractly by me- 
ans of asynchronous events and informal rules relating their occurrences. 
A formalization has been presented in [3] using certain posets of events 
(called event spaces) and a structural operational (small-step) seman- 
tics. Such an exact formal counterpart of the informal axiomatization of 
the Specification Book may not only serve as a reference semantics for 
different, possibly simplified, semantics, but also as a basis for language 
analysis. In this paper we present a machine-checked version of the for- 
malization using Isabelle/HOL. Some proofs showing the redundancy of 
axioms in the Java Specification Book are discussed. As usual, by Isa- 
belle’s austerity some tacit assumptions and few minor mistakes were 
revealed. 


1 Introduction 


Java is an object-oriented programming language which offers a simple and 
tightly integrated support for concurrent programming. A concurrent program 
consists of multiple tasks that are or behave as if they were executed all at the 
same time. In Java tasks are implemented using threads (short for “threads of 
execution”), which are sequences of instructions that run independently within 
the encompassing program. Informal descriptions of this model can be found in 
several books (see e.g. [2], [7]). A precise description is given in the Java language 
specification [5]. In [3] a formal semantics of a non-trivial sublanguage of Java 
which includes dynamic creation of objects, blocks, error handling, and synchro- 
nization of threads has been presented. The semantics is given in the style of 
Plotkin’s structural operational semantics (SOS) [12]. This technique has been 
used e.g. for the semantics of SML [9] and earlier for ADA [8]. 

Once having formalized the semantics, one can prove properties of the lan- 
guage mathematically. Corresponding work has already be carried out for the 
types system of (sequential) Java. In [4] it has been proved that the system for 
sequential Java is sound using big-step semantics, and a machine-checked proof 
in Isabelle/HOL has been given in {10}. 


J. Harrison and M. Aagaard (Eds.): TPHOLs 2000, LNCS 1869, pp. 480-497, 2000. 
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The thread model, and in particular the interaction between threads via sha- 
red memory, is described in [3] in terms of structures called event spaces!. By 
using similar structures in operational semantics, an abstract “declarative” de- 
scription of the Java thread model is obtained which is an exact formal counter- 
part of the informal language description {5] and which leaves maximal freedom 
for different implementations. Moreover, it can then be formally proven that a 
refinement of the semantics, e.g. using a concrete memory model, still fulfills the 
specification of (3}. 

The paper is organized as follows: First the abstract syntax is defined. The 
next section deals with the formalization of events and event spaces for the Java 
memory model. Then the axiomatization of stacks and stores is shortly discussed. 
In Section 5 the operational semantics is given as an inductively defined relation 
on configurations. Some remarks on the experiences with Isabelle and about 
ongoing and future work conclude the paper. 

Isabelle/HOL We assume that the reader is familiar with Isabelle/HOL. But, 
even if this is not the case, the definitions — as they appear in the paper — should 
be readable for anyone with basic knowledge of predicate logic and functional 
languages. It should be emphasized that | ] enclose the context of assumptions of 
a goal or rule, and that => denotes meta-implication by contrast to —> denoting 
implication on the object level. Also, note that in HOL the total function space is 
denoted by a => b whereas the partial function space is written a = b option, 
where option is the usual lifting operator on types with constructors Some: ’a 
=> ’a option and None:’a option. Moreover, we define IsDefined(x) =x # 
None and the is the partial inverse to Some. 

Detailed information about Isabelle can be found, for example, one the web- 
page http://isabelle.in.tum.de or in [11]. 

Remark The Isabelle formalization strongly follows [3] which can be consulted 
for more detailed explanations on the subject. 


2 Syntax 


In this section the relevant subset of Java-syntax is introduced?. 


2.1 Primitive Syntactic Domains 


We assume the following primitive types: left values, also called locations (lval), 
identifiers (identifier) and references different from null (nonnullobject): 


types nonnullobject 
lval 
identifier 


' These correspond roughly to configurations in Winskel’s event structures(13], which 
are used for denotational semantics of concurrent languages. 

2 It would have been nice to use the same definitions as [10] for the overlapping 
(sequential) part, but our axiomatization began independently of loc. cit. 
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Object references (obj), right values (results) (rval) and literals (literal) 
of basic Java-types (in our case just natural numbers and booleans) can then be 
defined inductively 


datatype obj = Nullobj | 
Nonnull nonnullobject 


datatype rval = Qref obj | 
Nval nat | 
Bval bool 


datatype literal = NatLit nat | 
BoolLit bool | 
Null 


Our formalization does not deal with typing problems (which are anyway 
considered in detail in [10]) but typing is still important for the operational 
semantics, so we have to model types. Any Java type (jtype) is either a class type 
(classtype) or a primitive type (primitivetype). How types are attributed to 
objects will be described later only abstractly (contrary to loc.cit). 


type classtype identifier 


datatype primitivetype = BoolType | 
NatType | 
VoidType 


datatype jtype PrimType primitivetype | 


CiType classtype 


Identifiers must come equipped with some types since for field identifiers there 
is a statically resolved overloading and method identifiers have to be resolved by 
dynamic lookup. Therefore, field identifiers carry the class type of declaration 
and method identifiers carry the static type of invocation needed for dynamic 
dispatch. 


datatype fieldidentifier = FId identifier classtype 
datatype methodidentifier = MId fieldidentifier (jtype list) jtype 


Moreover, we use type throws to represent exceptions that have been already 
thrown and thus need to be propagated through the context. 


datatype throws = Throw obj 
types thread = obj 


The type thread is used to identify threads. As those are represented by 
objects in Java, one can simply use obj. 
2.2. Environments 


Environments are needed to keep the local variables of a block, e.g. a method 
body. We prefer to present all the semantic domains of our operational semantics 
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by abstract datatypes instead of concrete definitions (see also Section 4), such 
that they can be replaced later by concrete implementations. Having proved 
that such an implementation satisfies the axioms, all the proofs based on those 
axioms will remain valid. 

Environments consist of a set of declared variables (identthis) and a par- 
tial map from identifiers to values which is actually total for the subdomain of 
declared identifiers (envmapping). The special identifier ThisExp, representing 
the semantics of the keyword this of Java, has to be included in the set of 
identifiers. The abstract type of environments can thus be given by the following 
types and operations: 


datatype identthis = Ident identifier | 
ThisExp 


types envmapping = (identthis = (rval option)) 


datatype env = En (identthis set) envmapping 


consts Updenv :: identthis > rval > env => env 
Mtenv :: env 
Lookup :: identthis > env => (rval option) 


The empty environment is represented by Mtenv, updating of variables and 
variable lookup are defined as expected (cf. Updenv and Lookup, respectively). 


2.3 Abstract Syntax 


The abstract syntax is a slightly simplified version of the BNF-syntax given 
for Java. The number of syntactic categories is reduced w.r.t. table 1 of [3] 
We only distinguish A-LeftHandSide (var), A-StatementEzpression (stmexpr), 
A-Expression (expr), A-Block (block), and A-CatchClause (catch). 


datatype 
var = 
Var identifier | 
FieldVar expr fieldidentifier 
and 
stmexpr = 
NewC classtype | 
Ass var expr | 
Vals rval | 
MCall expr methodidentifier (expr list) | 
AFrame methodidentifier block 
ThrowSE throws J 
ReturnSE (rval option) 
and 
expr = 
StmtExp stmexpr | 
Lit literal | 


Acc var | 
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UnOp (rval = rval) expr | 
Bin0p expr (rval => rval => rval) expr | 
This 
and 
stat = 
Nop | 
SemiCol | 
ThrowStmt expr | 
BlockStmt block | 
ExpStint stmexpr \ 
SyncStmt expr block | 
TryStmt block catch (catch list) | 
TryFinStmt block (catch list) block | 
ReturnStmt (expr option) | 
CondStmt expr stat 
VarDeclStmt jtype identifier expr 
and 
block = 
BlockIt (stat list) env 
and 
catch = 
cc jtype identifier block 


This mutually recursive type follows quite straightforwardly the syntax of 
Java, e.g. NewC stands for object creation? (aka new in Java), Ass stands for 
assignment (of program and field variables both represented by type var), ValS 
converts (right) values into expressions, MCall stands for method calls, activa- 
tion frames are built using AFrame in order to interpret method calls and to 
determine the control flow in case of return-statements, ThrowSE is used to pro- 
pagate an exception after it was thrown via the ThrowStmt-statement, ReturnSE 
propagates the result of a ReturnStmt which corresponds to Java’s return; as 
well as return e; by alternatively using a None- or Some-term as argument. 
Acc stands for variable access. Expressions of type stmexpr can be turned into 
expressions via StmExpr but also into statements via ExprStm. Blocks are built 
from constructor BlockIt that takes a list of statements, so no explicit append- 
operator on statements (usually ;) is needed. The other constructors should be 
obvious. 

In [3] there was no explicit analogue to ThrowSE throws, ReturnSE (rval 
option) and ValS rval. Without those, however, one cannot build a correct 
abstract term for the result of a statement expression e.g. an assignment. The 
formalization helped to keep syntactic categories in order. 


3 Event Spaces 


The execution of a Java program comprises many threads of computation running 
in parallel. Threads exchange information by operating on values and objects re- 


3 Note that we do not treat n—ary constructors with n > 0. 
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siding in a shared main memory. As explained in the Java language specification 
[5], each thread also has a private working memory in which it keeps its own 
working copy of variables that it must use or assign. As the thread executes a 
program, it operates on these working copies. The main memory contains the 
master copy of each variable. There are rules about when a thread is permit- 
ted or required to transfer the contents of its working copy of a variable into 
the master copy or vice versa. Moreover, there are rules which regulate the 
locking and unlocking of objects, by means of which threads synchronize with 
each other. These rules are given in [5, Chapter 17] and formalized in this section 
as “well-formedness” conditions for structures called event spaces. In the next 
section event spaces are included in the configurations of multi-threaded Java to 
constrain the applicability of certain operational rules. 
Memory Actions are defined in accord with [5}: 


datatype action = 
Lock thread obj | 
Unlock thread obj | 
Use thread lval rval 
Assign thread lval rval 
Load thread lval rval 
Store thread lval rval 
Read thread lval rval 
Write thread lval rval 


The terms Use, Assign, Load, Store, Read, Write, Lock, and Unlock are 
used here to name actions which describe the activity of the memories during 
the execution of a Java program. Use and Assign denote the above mentioned 
actions on the private working memory. Read and Load are used for a loosely 
coupled copying of data from the main memory to a working memory and dually 
Store and Write are used for copying data from a working memory to the main 
memory. 

Events are instances of actions, which happen at different times during exe- 
cution. Events are described abstractly without any commitment to a concrete 
representation, in particular without any time stamps or other coding in terms 
of natural numbers. We write e:Events(a) to indicate that an event e belongs 
to the set of instances of action a. Moreover, Alpha, Beta, and Betal distin- 
guish thread events, memory events, and memory events involving variables, 
respectively. 


types event 

consts Events :: action => (event set) 
Alpha :: thread => (event set) 
Beta :: obj => (event set) 
Betal :: Ilval => (event set) 


The axioms forthose event sets are: 


Alpha t = { e. dp. (e © (Events (Lock t p)) V 
e € (Events (Unlock t p))) V 
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( dl v. e € (Events (Use tlv)) Vv 
e € (Events (Assign t 1 v)) V 
e € (Events (Load til1lv)) V 
e € (Events (Store t1v)) )} 


| 


Beta p= {e. dt. e € (Events (Lock tp )) V 
e € (Events (Unlock t p )) } 


Betal 1 = { e. dt v. e € (Events (Write t 1 v)) V 
e € (Events (Read t1v)) } 


[ e € Events a; e € Events b ] => a=b 


VasS. de € Events(a). e ¢ S 


where the last axiom states that there are always enough events of any kind. 

For instance, a rule about the interaction of locks and variables in the specifi- 
cation book [5, 17.6, p. 407] states - in English prose — for a thread 0, a variable 
V and a lock L: 


“Between an assign action by [@] on V and a subsequent unlock action by 
{@| on L, a store action by ([@] on V must intervene; moreover, the write action 
corresponding to that store must precede the unlock action, as seen by the 
main memory. (Less formally: if a thread is to perform an unlock action on 
any lock, it must first copy all assigned values in its working memory back 
out to main memory.)” 


We soon come back to the question how to formalize such a requirement. Before 
we must define appropriate relations on events. 
An event relation is a relation 


types evtrelation = (event x event) set 


with a set of operations axiomatized below. An event is considered as “defi- 
ned” or in the carrier (cf. Carrier) of an event relation if it is related to itself 
(compare this to partial equivalence relations) so the definition of the carrier set 
is already part of the event relation: 


Carrier :: evtrelation => event set 

Carrier E = {x. (x,x) €E} 

Down :: evtrelation => event => event set 

Down E e ={d. (d,e) € E } 

DownSet :: evtrelation => event => (event = bool) > event set 


DownSet Ee P = { d. (d € Down E e) A P(d) } 


TotalIn :: (event set) = evtrelation > bool 
TotalIn AE =VxeEA. Vy EA. ((x,y) € EV Cy,x) € E) 


Extends :: evtrelation = evtrelation = bool (infixl 500) 
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X Extends Y = 
Carrier Y C Carrier X A YCXA 
(Va b. a € (Carrier Y) A b € (Carrier Y) A (a,b) € X 
—> (a,b) € Y) 


The predicate TotalIn yields true if the set of events A is totally ordered 
w.r.t. E. The set Down E e contains all events in E below e, and DownSet E e P 
is the intersection of Down E e with the elements fulfilling P. 

The following maps are called “pairing” functions. 


Read_of, Load_of, Store_of, Write_of, Lock_of, Unlock_of :: 
evtrelation = event = event option 


The function Read_of e.g. matches the n-th occurrence of Load(t,1,v) in 
E with the n-th occurrence of Read(t,1,v) if such an event exists in E and is 
undefined otherwise. In other words Read_of is a monotone injective partial fun- 
ction with a partial inverse. This is expressed (without mentioning any natural 
numbers) by the following axioms: 


e ¢ Events (Load t 1 r) => Read_of E e = None 


| Read_of E e = (Some f); e € Events(Load t 1 r) ] 
=> f € Events (Read t lr) Af € Carrier E 


[ Li € (Events (Load t 1 r)); L2 € (Events (Load t 1 r)); 
Read_of E Li = Some R1; Read_of E L2 = Some R2; (L1,L2) € E ] 
==> (R1,R2) € E 


[ Li € (Events (Load t 1 r)); (Read_of E L1) = Some Ri; 
Ri € (Events (Read t 1 r)); R2 € (Events (Read t 1 r)); 
(R2,R1) € E | 
==> iL2 € (Events (Load t 1 1r)). (Read_of E L2) = Some R2 


| Read_of E Li = Read_of E L2; IsDefined (Read_of E L1) | 
=> Li = 12 


Read_of E e = Some f => Load_of E f = Some e 


For Store_of, Write_of, Lock_of, Unlock_of analogous rules hold. 
An event space is a poset of events (thought of as occurring in the given 
order), ie. an event relation such that 


(trans E) A (antisym E) A (ClosedDown E) A (ClosedUp E) 
where 


(ClosedDown E) = V (x,y) € E. (x,x) € E 
(ClosedUp E) = V (,y) EE. (y,y) € E 


in which every chain can be enumerated monotonically with respect to the 
arithmetical ordering 0 << 1<2<... of natural numbers or fulfilling the slightly 
stronger condition 
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FiniteHist E = 
Ve € Carrier E. finite { d. d € Carrier E A (d,e) € E } 


which satisfies the conditions below that formalize directly the rules of [5, 
Chapter 17}. Contrary to (6] in FiniteHist we have chosen to avoid bijection 
with natural numbers and to use instead a condition that is nearer to configu- 
rations of event structures [13], namely that the history of any event is finite. 
The predicates ClosedUp and ClosedDown state reflexivity of the carrier of the 
argument relation. 


Event space rules For every rule we include some short, informal explanations 
and refer to [5] for more detail. In the Isabelle code we usually write t instead 
of a thread name @ and 1 for (field) variable names /: 

The actions performed by any one thread are totally ordered, and so are the 
actions performed by the main memory for any one variable [5, 17.2, 17.5]. 


Rulei (E) = Vt. TotalIn ((Alpha t) M (Carrier E)) E 
Rule2 (E) = Vo. TotalIn ((Beta 0) MM (Carrier E)) E 
A V1. TotalIn ((Betal 1) M (Carrier E)) E 


A Store action by # on / must intervene between an Assign by @ of | and 
a subsequent Load by @ of 1. Less formally, a thread is not permitted to lose its 
most recent assign [5, 17.3]: 


Rule3 (E) = Vtigs. 
VA € (Events (Assign t 1 r)). VL € (Events (Load t 1 s)). 
(A,L) € E 


-— > du. JS € (Events (Store t 1 u)). (A,S) € EA (S,L) € E 


A thread is not permitted to write data from its working memory back to 
main memory for no reason [5, 17.3}: 


Rule4 (E) = Vt lres. 
VS1 € (Events (Store t 1 4r)). VS2 € (Events (Store t 1 s)). 
si #82 —» (81,82) EE 
— du. JA € (Events(Assign t 1 u)). (S1,A) € EA (A,S2) € E 


Threads start with an empty working memory and new variables are created 
only in main memory and not initially in any thread’s working memory [5, 17.3]: 


Rule5 (E) = Vtilre. 
VU € (Events (Use t 1 r)). U € (Carrier E) 
—» (ds. dA € (Events (Assign t 1 s)). (A,U) € E) V 
(as. AL € (Events (Load t 1 s)). (L,U) € E) 


Rule6 (E) = Vtir. 
VS € (Events (Store t 1 r)). S € (Carrier E) 
— (ds. 5A € (Events (Assign t 1 s)). (A,S) € E) 


A Use action transfers the contents of the thread’s working copy of a variable 
to the thread’s execution engine [5, 817.1]: 
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Rule7 (E) = YVtilrs. 
VA © (Events (Assign t 1 r)). We (Events (Use t 1 s)). 
r#s —> (A,U) €E 
— (du. JAi € (Events (Assign t 1 u)). 
(A,A1) © EA (A1,U) CE AA SF Al) V 
(dw. AL € (Events (Load t 1 w)). (A,L) € EA (L,U) € E) 


Rule8 (E) = Vtrlrers. 
VL € (Events (Load t 1 r)). YW € (Events (Use t 1 s)). 
r#s —> (L,U) €E 
— (du. JA € (Events (Assign t 1 u)). (L,A) € EA (A,U) € E) V 
(dw. SL1 € (Events (Load t 1 w)). (L,L1) € EA (L1,U) € E) 


A Store action transmits the contents of the thread’s working copy of a 
variable to main memory [5, 17.1]: 


Rule9 (E) = Vt lrs. 
VA © (Events (Assign t 1 r)). VS € (Events (Store t 1 s)). 
xr #s —> (A,S) €E 
— du. JAi € (Events (Assign t 1 u)). 
(A,A1) € EA (41,8) CEAAF Al 


Each Load or Write action is uniquely paired respectively with a matching 
Read or Store action that precedes it [5, 17.2, 17.3]: 


RuleiO (E) = Ve lr. 
VL € (Events (Load t 1 4r)). L € (Carrier E) 
—> JR € (Events (Read t 1 r)). 
(Read_of E L) = (Some R) A (B,L) € E 


Rulei1 (E) = Vtil1r. 
Vw ce (Events (Write t 1 4r)). W € (Carrier E) 
— ds € (Events (Store t 1 r)). 
(Store_of E W) = (Some S) A (S,W) EE 


There are six more rules about locking and unlocking which are omitted due 
to lack of space. 


Discussion. Each of the above rules corresponds to one rule in [5]. Conversely, 
some more rules of [5] can be derived in our axiomatization. In particular, we 
can prove w.r.t. any event space E 


[ ClosedUp E; ClosedDown E; trans E; antisym E; 
FiniteHist E; Rulei E; Rule3 E; Rule4 E; Rule6 E; 
S € Events(Store t 1s); L € Events(Load t lr); 
(L,S) €E |] 
= > du. SA © Events(Assign t 1 u). (A,S) € EA (L,A) € E 


stated as an axiom in [5, 17.3] which means that between a Load and a Store 
for the same variable of the same thread there is an Assign for this thread and 
variable in between. 

In fact, by Rule6 there must be some Assign action before the Store, even 
a “maximal” such because of the following lemma: 
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if ClosedUp E; ClosedDown E; trans E; FiniteHist E; 
Rulel E; e € Carrier E |] => 
(DownSet E e P # {}) —> (Smax. max € (DownSet Ee P) A 
(Wc. c € (DownSet E e P) —> (c,max) € E)) 


by setting P = Ae. 3 t lv. (e = Assign t 1 v) Ae ©€ (Carrier E). 
This maximal one must intervene between the Load and the Store, because 
otherwise, from Rulei and Rule3 there would be two Store events for the same 
variable with no Assign in between, which contradicts Ruled. 

Similarly, the following rule of [5, 17.3] can be derived from Rule10 and 
Ruleli: 


[ ClosedDown E; ClosedUp E; antisym E; trans E; FiniteHist E; 
Rule2 E; Rule1iO E; Ruleii E; 
L € Events(Load t 1 r); S € Events(Store t 1s); 
(L,S) € E; Write_of E S = Some W; | 
= 
SR. Read_of E L = Some R A (R,W) EE ; 


The clause (Rule6) (as well as unspelled (Rule17)) simplify the corresponding 
rules of {5, 17.3, 17.6] which include a condition Load(t,1,v) <Store(t,1,v) to 
the right of the implication. This extra condition is redundant, however, because 
of the first lemma discussed above. 

Note that the language specification requires any Read action to be completed 
by a corresponding Load and similarly for Store and Write. We do not translate 
such rules into well-formedness conditions for event spaces because the latter 
must capture incomplete program executions. 


IsEvtSpace E = 
( (Rulei E) A (Rule2 E)... 
A (Rulei6 E) A (Rulei7 E) A (trans E) A (antisym E) 
A (ClosedDown E) A (ClosedUp E) A (FiniteHist E) ) 


Usage in operational semantics. Event spaces serve two purposes: First, they 
provide all the information to reconstruct the current working memories of all 
threads (which in fact do not appear in the configurations). Second, event spa- 
ces record the “historical” information on the computation which constrain the 
execution of certain actions according to the language specification, and hence 
the applicability of certain operational rules. 

A new event is adjoined to an event space E by extending the execution order 
as follows: If the event e is in Alpha t then it is above all other events in Alpha 
t, and analogously for memory events. 


AdjoinSet :: evtrelation = event => (evtrelation set) 
AdjoinSet Ea = 
{ R. IsEvtSpace R A 
(Carrier R = (Carrier E) Un {a}) A 
a ¢ (Carrier E) A 
(R Extends E) AN 
(vt ql. 
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(if a € (Alpha t) M (Carrier E) 
then (Vai € (Alpha t) M (Carrier R). (ai,a) € R) 
else (if a € (Beta q) M (Carrier E) 
then (Val € (Beta q) M (Carrier R). (a1,a) € R) 
else (if a € (Betal 1) M (Carrier E) 
then (Val € (Betal 1) M (Carrier R). (al,a) € R) 
else True) } 


The term E [+] e denotes the space thus obtained, provided it obeys the 
above rules, and it is otherwise undefined. 


{+] :: evtrelation => event => (evtrelation option) (infixl 999) 
E [+] a = if (AdjoinSet E a = {}) 
then None 


else Some (QF. FE(AdjoinSet E a)) 


4 Stacks and Stores 


Since the scope of local variables is determined by the block structure of the 
program, we keep them in a stack which grows and shrinks upon entering and 
exiting blocks. On the other hand, objects are permanent entities which survive 
the blocks in which they are created; therefore the collection of their instance va- 
riables (containing the values of their attributes) is kept in a separate structure: 
the store. Intuitively, stores can be thought of as mapping left-values (addresses 
of instance variables) to right-values (the primitive data of Java). Later on we 
shall see how different threads interact through the store. A formal description of 
stacks, stores and the configurations of the operational semantics is given below. 


datatype stack = Mtstack | 


Push env stack 
Pop 2: stack > stack 
Top :: stack > env 
Bind :: didentthis => rval => stack > (stack option) 
Assig :: identthis > rval => stack => (stack option) 
Lookup_stack :: identthis = stack = (rval option) 


The (common) axioms are omitted. However, notice the difference between 
Bind and Assig. The former binds an identifier to a value in the top level 
environment, the latter searches in the stack for the first environment in which 
the identifier is declared and then updates it. 


types store 


MtStore:: store 


New :: classtype = store = (nonnullobj x store) 
Upd :: lval = rval => store => store 
Get :: lval > store = rval option 


Init :: fieldidentifier = store => rval 
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Given a class and a store, New creates a new (nonnull) reference pointing to 
freshly allocated memory of the right size (depending on the class type) and 
initializes the fields of this newly created object with values specified by Init. 
This is axiomatized as follows: 


IsDefined (Get (Lval (fst (New C s)) (FId icC)) (snd (New C s)) ) 
= > the (Get (Lval (fst (New C s)) (FId i C)) (snd (New C s)) ) 
= Init (FId iC) s 


The axioms for Upd, Get, and MtStore are omitted as they simply denote 
update, retrieval, and empty store, respectively. 


5 Semantic Rules for Multi-Threaded Java 


Stores assume in multi-threaded Java a more active role than they have in se- 
quential Java because of the way the main memory interacts with the working 
memories: a “silent” computational step changing the store may occur with- 
out the direct intervention of a thread’s execution engine. Changes to the store 
are subject to the previous occurrence of certain events which affect the state 
of computation. Event spaces are included in the configurations to record such 
historical information. 


Multi-threaded terms, stacks, and configurations. The type aterm serves as a 
supertype for statements, expressions and lists thereof. 


datatype aterm = ExprT expr | 
ExprSeqI (expr list) | 
StatT stat | 


StatSeqT (stat list) 


In staterecord it is kept the information whether a thread is ready to 
execute (R), waits on an object having released a number of locks (W) or is in the 
state of being notified (N) still having to claim an amount of locks on an object. 


datatype staterecord = R | 
W obj nat | 
N obj nat 


An mterm is a set of 4-tuples consisting of a thread name, an abstract term 
to be executed, the state of the thread and a stack for the program variables. A 
configuration is a 4-tuple containing an mterm (i.e. all threads and their actual 
state), a set of objects which are bound to die (only relevant when explicit 
stopping of threads is allowed), an event space, and the main memory called 
store. 


types mterm = (thread x aterm xX staterecord x stack) set 
config = (mterm x (obj set) x evtspace x store) 


The one step reduction relation to_in_a_step is abbreviated ---> and the 
transitive closure to_in_n_steps is abbreviated to -+*->. 
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to_in_a_step :: (config x config) set 
to_in_n_steps:: (config x config) set 
in_a_step :: [config,config] = bool ( _ ---> _ ) 
in_n_steps :: ([config,config] = bool ( _ -*-> _ ) 


cl ---> c2 = (cl,c2) € to_in_a_step 
cl -*-> c2 = (c1,c2) € to_in_n_steps 


The operation WaitSet computes the threads that are awaiting release of a 
certain object lock. 


WaitSet :: mterm = obj > (obj set) 
WaitSet Tp={t. drna. (t,a,(Wpn),r) €T } 


Frame is an auxiliary (partial) function that produces the right activation 
frame for a method call instantiating the list of formal parameters by the given 
list of actual values. The definition of Frame is not difficult and omitted. It makes 
use of MethodBody which is an abstract function (ie. it has no definition but has 
to be axiomatized appropriately if needed) that for a given identifier returns the 
definition body of the method together with its formal parameters. 


Frame :: obj = methodidentifier > (rval list) 
=> store = (stmexpr option) 
Methodbody :: classtype — methodidentifier => store 


=> ( block x (identifier list) ) option 


Moreover we use the following abbreviations for sets of mterms: 


MT :: mterm 

lH :: mterm > (thread x aterm x staterecord x stack) => mterm 
MT = {} 

T |] m = insert mT 


Let Val v abbreviate StmtExpr (ValS v) which is a value obtained as a 
result of an expression. 


Spontaneous memory actions. 


[ (t,a,R,r) € T ; (Get 1s) = Some v; 

(dre € (Events(Read t 1 v)). E [+] re = Some E1) ] 
—> 

(T,Q,E,s) ---> (T,Q,E1,s) 


[ (t,a,R,r) € T; 

(jl v. dst € (Event(Store t 1 v)). E [+] st = Some E1) | 
= 

(T,Q,E,s) ---> (T,Q,E1,s) 


Note that the second rule “guesses” the value of the last Assign in order to 
perform a store action; axiom Rule9 ensures that the guess is right. There are 
analogous rules for Load and Write. 

The function for computing a left value out of a variable expression, if pos- 
sible, is called ExpLval (we omit the definition): 


ExpLval :: var =(lval) option 
IsLval :: var => bool 
IsLval e = ExpLval e # None 
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Assignment rules. First evaluate the variable on the left. 


( (T1 || (t,(ExprT (Acc e1)),R,r1)), Q1, El, sl ) ---> 
( (T2 || (t,(ExprT (Acc e2)),z,r2)), Q2, E2, s2 ) 
= 
( (T1 Il (t, (ExprT (StmtExp (Ass et e))),R,ri)), Q1, Ei, st ) 
---> 
( (T2 || (t,(ExprT (StmtExp (Ass e2 e))),z,r2)), Q2, E2, s2 ) 


Next evaluate the expression on the right: 


( (T1 || (t, (ExprT e1),R,r1)), Qi, E1, si ) ---> 
€ (T2 [| (t, (ExprT e2),z,r2)), Q2, E2, s2 ) 
= 
C (T1 Il (t,(ExprT (StmtExp (Ass (Var i) e1))),R,r1)), Q1, E1, s1 ) 
---> 
( (T2 || (t,(ExprT (StmtExp (Ass (Var i) e2))),z,r2)), Q2, E2, s2 ) 


| IsLval 1; 
( (Ti Ut (t, (ExprT e1),R,r1)), Q1, E1, si ) ---> 
( (T2 {| (t,(ExprT e2),z,r2)), Q2, E2, s2) ] 
=> 
( (Ti || (t, (ExprT (StmtExp (Ass 1 e1))),R,r1)), Q1, El, s1 ) 
~--> 


( (72 || (t,(ExprT (StmtExp (Ass 1 e2))),z,r2)), Q2, E2, s2 ) 


If both sides are evaluated update the working memory for an assignment 
to a field variable (which involves the event space, update of the environment 
for a programm variable is similar). Nothing is said, so far, about a possible 
write-through to the store: 


| ExpLval le = Some 1; 
das € (Events(Assign t 1 v)). E [+] as = (Some E1)) ] 
= 
( (T |) (t, (ExprT (StmtExp (Ass le (Val v)))),R,r)), Q, E, s ) 
---> 
( (T II (t, (ExprT (Val v)),R,r)), Q, El, s ) 


Rules can be applied only if the operation [+] is defined for the given ar- 
guments, that is, if the action being performed complies with the requirements 
of the language specification. By the above rules Assign (the same e.g. for Use) 
actions are only added to an event space, when dictated by execution of the 
current thread (5, 17.3]. There are 91 rules altogether, so we can only present 
some selected rules due to lack of space. Of course, some rules involving two 
threads are interesting: 


Starting other threads. Let IllegalThreadStateExcep:: classtype and Start, 
Run :: methodidentifier be given constants. The precondition 


Frame t2 Start [] s = None 
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of the following rules ensures that there is no overloading of start with a user- 
defined method. The first rule describes the standard case, the second covers 
the cases where the started thread is bound to die or there is no appropriate 
(user-defined) run()-method for it. The third rule is fired if one tries to start 
a thread that is already running. According to the Java specification this will 
result in an exception. 


| Frame t2 Start [] s = None; 
t2 ¢ Q; Frame t2 Run [] s = Some se | 
=> 
( (T |] (t1,(StatT (ExpStmt (MCall (Val (Gref t2)) Start []))), 
R,r1)), Q, E, s ) ---> 
( (CT II (ti, (StatT Nop) ,R,ri)) 
Il (t2,(StatT (ExpStmt se)),R,Mtstack)), Q, E, s ) 


[ Frame t2 Start [] s = None; 
(t2 € QV (Frame t2 Run [] s) = None) | 
= 
( (T Il (t1,(StatT (ExpStmt(MCall (Val (Oref t2)) Start []))), 
R,ri)), Q, E, s ) ---> 
( CCT II (t1, (StatT Nop) ,R,ri)) 
|| (t2,(StatT Nop),R,Mtstack)), Q - {t2}, E, s ) 


| Frame t2 Start [] s = None; ((t2,a,z,r1) € T V t2 = t1); 
New IllegalThreadStateExcep s = (p,si) ] 
= 
( CT II (t1,(StatT (ExpStmt(MCall (Val (Oref t2)) Start []))), 
R,r)), Q, E, s ) ---> 
( (CT || (t1,StatT(ExpStmt(ThrowSE (Throw (Nonnull p)))), 
R,r)), Q, E, si ) 


6 Why Doing It with Isabelle 


It is well accepted meanwhile that a formalization by “paper and pencil” should 
be followed — if possible — by a thorough machine-checkable formalization. Theo- 
rem provers and proof checkers like Isabelle provide a corresponding tool for 
constructing and maintaining such a high and trustworthy level of formality. 
This claim has been once more sustained in the present case for the semantics 
of multi-threaded Java. 

In the formalization [3] of the informal Java specification [5] some minor 
errors could be found regarding the “type” correctness of abstract syntax. The 
use of a tool often reveals gaps and hidden assumptions. In our case, it gave rise 
to the discussion how to express the fact that chains of events are well-founded. 

Moreover, since terms of abstract syntax can become quite huge and clumsy, 
it comes in handy to dispose of a system that helps to manipulate such terms 
and to perform the bookkeeping. Syntax sugar (already supported by Isabelle) is 
important and its usage should be as easy as possible to obtain readable proposi- 
tions (to say nothing of the proofs). One of the most striking and helpful features 
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of Isabelle, is the available stock of theories and theorems easily accessible on 
the web. 

To the Isabelle-novice, some behaviour of Isabelle may look quite confusing, 
in particular error messages like “inner syntax error”*. Another problem with 
error messages occurs in inductive definitions where they neither refer to the 
name of the rule nor to line numbers. 

A very tricky feature (especially if you have worked with other provers be- 
fore) is the fact that Isabelle declares implicitly any unknown identifier as a 
new variable. On one hand this is convenient as one is liberated from writing 
many declarations. On the other hand already a misspelled constant may lead 
to opaque effects later on. Nested datatypes with Isabelle-98 at least, produce 
a memory overflow on a machine with 256 MB RAM (Isabelle-gurus probably 
have more memory at hand®) and caused more problems that expected when 
coding up the syntax. 


7 Conclusions and Future Work 


In this paper we have presented a machine-checked version of the structural 
operational semantics of the concurrency model of Java in Isabelle. Our seman- 
tics covers a substantial part of the dynamic behaviour of the language. Most 
notably type information (class, interface and method declarations) and some 
control flow statements are missing. By using Isabelle several minor flaws mainly 
regarding the abstract syntax and the syntactic well-definedness of terms on the 
right hand side of reduction rules were discovered in the “paper-and-pencil” 
specification of [3]. 

It would be nice to extend the multi-threaded semantics with a type-checking 
result in the form of a subject reduction theorem as done in [10] for sequential 
Java using big-step semantics. One should be able to reuse some parts of loc. cit. 
adopting their notation. 

On the basis of the presented Isabelle/HOL-theories we are currently try- 
ing to formalize more proofs about language analysis: The first is a correctness 
theorem of so-called “prescient” store actions [5, 17.8]. These actions “allow op- 
timizing Java compilers to perform certain kinds of code rearrangements that 
preserve the semantics of properly synchronized programs [ ... ].” ([3]). The 
second theorem for “properly synchronized programs”, i.e. those that change 
and access shared variables only in synchronized blocks, states that it cannot 
be observed by other threads whether read/load and store/write events happen 
synchronously or asynchronously. This would mean a possible simplification of 
the memory model. 


“ It can take e.g. a long time to detect that one is not allowed to use a variable name 
o as it is already used for function composition. Moreover, trying rewrite_tac in- 
stead of rewrite_goals_tac one obtains the mysterious message “proved different 
goal”. 

° Thanks to Markus Wenzel for pointing out to us the quick_and_dirty mode. 
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It seems a desirable and achievable goal to produce some Isabelle theories 


containing the type system and semantics of Java plus some characteristic meta 
properties of the language (including proofs). Those could not only serve as a 
“certificate” for Java, but may also contain material that can be reused for the 
semantics of other languages still awaiting definition. 
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Another Look at Nested Recursion 


Konrad Slind 


Cambridge University Computer Laboratory 


Abstract. Functions specified by nested recursions are difficult to de- 
fine and reason about. We present several ameliorative techniques that 
use deduction in a classical higher-order logic. First, we discuss how an 
apparent circular dependency between the proof of nested termination 
conditions and the definition of the specified function can be avoided. 
Second, we propose a method that allows the specified function to be 
defined in the absence of a termination relation. Finally, we show how 
our techniques extend to nested program schemes, where a termination 
relation cannot be found until schematic parameters have been filled in. 
In each of these techniques, suitable induction theorems are automati- 
cally derived. 


1 Introduction 


Recursion equations specifying a function f are said to be nested when an argu- 
ment to a recursive call of f contains another invocation of f. For example, the 
second clause in the following equations has a nested recursion: 


gO0=0 (1) 
g (Suc x) =g (g 2). 

Nested recursion has traditionally posed problems for mechanization, espe- 
cially in logics of total functions. The standard criterion in such a logic for ac- 
cepting that recursion equations form a ‘good’ definition is that the arguments 
to recursive calls must decrease in a wellfounded relation. For our example, 
this means that for some wellfounded relation R, both Vz. R x (Suc x) and 
Va. R (g x) (Suc x) must be proved. Taking R to be the less-than relation (<), 
the first of these termination conditions is easy enough, but the second seems 
problematic. The trouble is that even stating Vr. g x < Suc x as a meaning- 
ful proposition seems to assume that g has been defined, but that is just the 
point of proving the termination conditions! Thus there seems to be a circu- 
lar dependency between definition and termination proofs in the case of nested 
recursion. 

We have found that working formally in a mechanized logic has helped to 
clarify some of the intricacies surrounding nested recursion. In our general ap- 
proach, recursion equations (nested or not) are given meaning by deriving them 
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from the following wellfounded recursion theorem:! 


(f = WFREC R F) AWF(R) > V2. f(r) =F (f|B,2) 2. (2) 


To derive the specified equations, the equations are first reduced, via a pattern 
matching translation similar to that used in compilation of functional programs, 
to a functional ¥. The HOL constant definition mechanism is then used to define 
the constant as an application of WFREC: 


f = WFREC R fF. (3) 


Subsequently, (2) is instantiated by (3). Further deductive steps are then required 
to (automatically) extract termination conditions and prove them. If all the 
termination conditions are proved, then the specified recursion equations may 
be used as unconstrained rewrite rules. This approach has been mechanized in 
the TFL system [17,19], which has been instantiated to Ho198 and Isabelle/HOL. 

In TFL, every recursive definition is accompanied by an induction theorem, 
which is derived from the wellfounded induction theorem by an algorithm de- 
scribed in [18]. The style of such an induction theorem can be sloganized as ‘the 
induction hypothesis holds for each argument to a recursive call’. With this in 
mind, the following is the induction theorem specified by g: 


VP. PO AWa. PxAP (gz) > P (Suc z)) D Vu. Pv. (4) 


This general approach, of defining total functions by appeal to wellfounded 
recursion and reasoning using induction theorems based on the recursions of 
the functions has been taken by many systems, most notably that developed by 
Boyer and Moore [4]. 

In the remainder of the paper, we discuss two styles of treating nested fun- 
ctions and their induction theorems. In the first, the termination relation is 
supplied at the moment the function is defined. In the second, the termination 
relation is not given at function definition time; instead, it is represented by a 
variable, which can be instantiated at the user’s convenience. Both of these styles 
circumvent the circularity problem described above, and both derive appropriate 
induction theorems. 


2 Definitions with Termination Relations 


This section is a detailed elaboration of work already reported in [17]. In a ve- 
rification of McCarthy’s 91 function, we defined 91 by supplying an appropriate 
termination relation at definition time and were subsequently able to prove the 
nested termination condition for 91. Somewhat surprisingly, the proof of termi- 
nation for 91 was self-contained, in the sense that the specification of 91 was not 
needed; this provided a counterexample to the prevailing wisdom [10,14] which 


1 Notation: WFREC is a ‘controlled’ fixpoint operator, WF denotes wellfoundedness, 
and f|R,2 is a function restriction (none of these notions are explored in this paper). 
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held that proofs of termination and correctness for nested recursive functions 
must be intertwined. 

We will use g to illustrate these points. After transforming the equations (1) 
into a functional F and making the definition g = WFREC (<) Ff, the following 
constrained equations are derivable, as explained in [17]:? 


WF(<)- g0=0 
IWF(<), 2 < Suca, gx < Suc z]+ g(Suc x) = g(g 2). 


It is simple to prove and eliminate WF(<) and x < Suc z, leaving 


+ g0=0 (5) 
gz <Sucazt g(Suc x) = g(gz). 

To finish the termination proof of g requires showing the nested termination 
condition Vz. g x < Suc x. However, the proof seems problematic since, in any 
attempt to use the recursive clause of (5) in the proof, the hypothesis g x < Suc x 
must be eliminated. As already noted, the historical answer to this apparent 
circularity has been to assert that correctness and termination must be proved 
simultaneously for nested recursions, for then the correctness property can be 
used instead of attempting to unroll the function in the proof of the nested 
termination condition. 

However, another approach emerged in the mid-1990’s [8,17]: a strong enough 
inductive hypothesis allows the circularity to be avoided. For example, the nested 
termination condition for g can be proved by induction along < (a case split and 
a lemma are needed, however). In fact, applying (4) seems to be exactly what is 
needed. The full rendering of (4) is actually 


[WF(<), (Vx. 2 < Suc xz), (Vz. gx < Suc z)] 
F (6) 
YP.P0 A(VWa. PxAP (gx) > P (Sucz)) DV. Pv. 


Notice that the termination conditions are fully quantified; this is necessary 
to make the derivation go through. As above, WF (<) and Vx. x < Suc @& are 
trivial to prove and eliminate from the hypotheses, leaving us with 


[Ve.ga<Sucr|]}+VP.P0A(Ve. PrAP (gz) > P (Sucz)) > Vv. Pv. (7) 


Attempting to apply this theorem to prove Vz. g x < Suc z is obviously not 
going to work because application of the induction theorem presupposes that 
which is to be proved. However, a slight variant of (7) is usable: instead of quan- 
tifying the nested termination condition and moving it onto the hypotheses while 
deriving the induction theorem, the conditions on the use of the nested induction 


? Unlike earlier work, termination condition extraction in this paper quantifies termi- 
nation conditions as little as possible, i.e., only variables bound in the right hand 
side of an equation will become universally quantified in the extracted termination 
conditions. 
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hypothesis are left ‘in place’. This yields the following induction theorem (where 
WF (<) and Vz. x < Suc x have already been eliminated from the hypotheses): 


FYP. PO A(Vz. PaA(ga <Sucaz > P (gx)) D> P (Suc z)) > Vu. Pv. (8) 
ener ORE TE enema ee 


nested i.h. 


We call this the provisional induction theorem. TFL automatically derives the 
provisional induction theorem for nested recursive definitions that come with 
termination relations. In some cases, the provisional induction theorem can be 
useful for proving nested termination conditions, and after they have been proved 
(by whatever means), they can be eliminated from the provisional induction 
theorem to obtain the specified induction theorem. With that background, we 
return to our example. 

Va. ga < Suc es. (9) 


Proof. Induct with (8). This leaves two goals: g 0 < Suc 0 (which is proved by 
unwinding (5) at 0) and the goal stemming from the recursive case:? 


g (Suc x) < Suc (Suc x) 
0. ga < Sucr Dg (gz) < Suc (gz) 
l. gx <Sucer 


Rewrite with (5); this is allowed because the constraint on applying (5) is just 
assumption 1. 
g (gx) < Suc (Suc x) 
0. gx < Sucxr Dg (gz) < Suc (gz) 
1. gx <Sucer 


The hypotheses yield g(gx) < Suc (gx). The proof then completes by a chain 
of inequalities: 
g (gaz) <gux< Suc zx < Suc (Suc 2). 


O] 


Now (5) and (8) can be freed of (9); after this, they can be applied to, e.g., 
prove the specification of g: Vr. g x = 0. Note that this property could also have 
been proven straightaway by use of (5) and mathematical induction. However, 
that doesn’t invalidate our point: in many cases, termination and correctness 
can be proved separately, which is often simpler than a proof of the combined 
properties. 

The general picture that this small example illustrates is that—in some 
cases—the constrained recursion equations for a nested function f and its pro- 
visional induction theorem can be used to prove totality of f. A crucial point in 
the proof will require f to be ‘evaluated’ at an argument a such that a nested 
call f(b) results. The unrolling of f(a) will be allowed on condition that f(6) 
terminates. If the inductive hypotheses are strong enough, this condition can be 
shown without circularity. 


? The goal is above the line and assumptions are below. 


502 K. Slind 
3 Example: First Order Unification 


The nested unification algorithm we verify* in this section was first described, 
informally but in great detail, by Manna and Waldinger [10]. Larry Paulson 
later verified the algorithm using Cambridge LCF [14]. Sten Agerholm duplica- 
ted Paulson’s work in a version of LCF built inside the HOL system [1]. The 
algorithm has also been verified using Type Theory [16,3,11]. 

Following Paulson, we define terms as binary trees instead of the more stan- 
dard n-ary trees; this is not a significant limitation, as Paulson argues [14]. The 
following are the constructors of the term type: 


Var: a@— aterm 
Const : @ — a term 
-»:aterm—aterm-— aterm (infix, left associative) 


The variables of a term are denoted by vars_of, and can be defined by a simple 
recursion. Similarly, the occurs check is defined as follows: 


occ(u, Var v) = False 
occ(u, Const ce) == False 


occ(u, M. N) = (u = M) Vv (u=N) V occ(u, M) V occ(u, N) 


This definition implies that the occurs check is actually a proper suboccurrence 
check: for example, it is not true that occ(zx, 2). 
Substitutions are represented by association lists and applied by the following 


definition.® 
(Var v 1 @) = assoc v (Var v) 8 


(Const cd 0) = Const c 
((M.N) <8) = (M 40).(N <6) 


The equality of substitutions is defined as: (0 =, 0) = Vt. tda =t 40. Compo- 
sition of substitutions is a left associative binary infix operator: 


[] obi = bl 
((a, 6) :: al) e bl = (a,b dl) :: (al e bl) 


Many standard facts about substitutions are required in the formalization; e.g., 
composition of substitutions is associative: | (yea)e@ =, ye (oe) and that 
iterated substitutions can be composed: (subst_comp) | (t<(res)) = ((tar)4s). 

The unification algorithm returns a most general unifier of the input terms, 
when it succeeds. Unifiers are substitutions that make terms equal and a most 


4 The verification was performed in the Isabelle/HOL instantiation of TFL. 
° The assoc function is defined as 


assocd x []=d 
assoc d x ((p,q) 1: t) = if x =p then q else assocd rt. 


Another Look at Nested Recursion 503 


general unifier is a unifier that can be instantiated to get any unifier. 


Unifier @t u = (ts6 =u) 
MGU @t u = Unifier @t uAVo. Unifiero tu D> 3y.0 =, Oey 


The recursion equations specifying the unification algorithm are the following: 


. Unify(Const m, Const n) = if (m =n) then Some[] else None 
. Unify(My.N1, M2.N2) = case Unify(M, M2) 
of None => None 
| Some @ => case Unify(M 46, No <4) 
of None => None 
| Some o => Some(@ ec). 
(10) 
As can be seen, the algorithm recurses only in clause 7. The termination 
of Unify is difficult to prove for two reasons: (1) the nested recursion and (2) 
the arguments in Unify(N, <6, N2 <6) can be larger than the arguments to 
Unify(M,.N1,M2.N2), which means that a simple size-based relation won't 
work. The termination relation (named UTR) for Unify is essentially that given 
by Manna and Waldinger: UTR (M1, Mz) (Ni, No) holds if 


1, Unify(Const m, M@.N) =None 

2. Unify(M .N, Const x) = None 

3. Unify(Const m, Var v) = Some[(v, Const m)| 

4. Unify(M.N, Var v) = if occ(Var v, M.N) then None else Some([(v, M.N)| 
5. Unify(Var v, M) = if occ(Var v, M) then None else Some[(v, M4)| 

6 

7 


— vars_of M, U vars_of M2 is a proper subset of vars_of Ny U vars_of No; orelse 
— vars.of M, U vars_of Mz = vars_of N, U vars_of No, and the size of M, is less 
than the size of N; and the size of Mz is less than the size of No. 


The formal details of the construction of UTR may be found in [19]. Only the 
base cases of the termination proof for Unify require unfolding the definition of 
UTR; in the induction step, we merely require that (1) UTR is transitive, and 
(2) that (loosely) UTR ignores a certain amount of term structure: 


(X, (A.(B.C), D.(E.F))) € UTR 
») (11) 
(X, ((A.B).C, (D.E).F)) € UTR. 


When the recursion equations together with UTR are submitted to TFL, the 
following theorems are returned: a constrained set of equations, and a constrained 
induction principle. The returned equations 1 to 6, not being recursive, are just 
what was specified, and are constrained only by WF UTR. Equation 7 has the 
following 2 non-nested termination conditions attached: ((M1, M2), (M1.Ni, Ma. 
N2)) € UTR and WEF UTR, and also the nested termination condition. The two 
non-nested termination conditions are easy to prove and eliminate from the 
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returned recursion equations and the provisional induction theorem, which is 


F ¥P.(vm n. P (Const m, Const n)) A (12) 
(vm MN. P (Const m,M.N)) A 
(vm v. P (Const m, Var v)) A 
(Vu M. P (Var v,M)) A 
(VM N x. P(M.N,Const x)) A 
(VM Nv. P(M.N,Varv)) A 
( 


> ((N; 40, Nz <0), (My +N, Mz.»No)) € UTR 
2P (N16, No <6) 
\ P (My, Mz) > P (My. Ni, Mz. No)) 
>) 
Vu v1. P (v, v1). 


& Unify(MZ,, Mz) = Some @ 


Next we tackle the nested termination condition. Applying the provisional in- 
duction theorem (12) makes the proof relatively straightforward. Manna and 
Waldinger needed to use idempotence of the substitutions coming from Unify—a 
correctness property—at a crucial point in their termination proof. In contrast 
we need no such extra information. 


(M 16, No 4 8), 
(Mi .Ny, Mz. Np) 


Proof. The first thing we do is shrink the scopes of N; and Nz as much as 
possible. Thus it suffices to prove 


V@. Unify( M41, M2) = Some @ D ( ' € UTR. (13) 


VM, Mo @. Unify(M), Mz) = Some 6 D 
VN, No. (Ny 46, No <6), (My . Ny, Mo . No2)) € UTR 


We then induct with (12) and simplify with the rewrite rules for Unify. The base 
cases are all easy; for the recursive case, the following is to be shown: 


case Unify(M, M2) 
of None=>None 
| Some 63=>case (Unify(N, 403, No < 63) = Some @ 
of None=>None 
| Some c=>Some(63 e c)) 


> 
VP Q.((P 56,Q <6), (M,.Ni).P,(Mz.N2).Q) € UTR 
The two inductive hypotheses are the nested i.h. 


+ VA. Unify(Mq, Mo) = Some & D 
((Ny 400, No <0), (My ~N,,Mo. N2)) E€UTRD 
VOq. Unify(Ny 149, No <9) = Some 6; > 
YP Q. ((P.46;,Q 441), ((N; <9) » P,(Np 40) «Q)) € UTR 
(14) 
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and the non-nested ih. 


F V9. Unify(M,, M2) = Some 62 D (15) 
VN, No. ((N, 02, No 62), (M, 2 Ni, Me . N2)) € UTR. 


The proof proceeds by two case analyses on the results of applying Unify. We 
start by making a case analysis on Unify(M,, M2). The None case is vacuously 
true; alternatively, suppose that Some 63 is the result. We can use modus ponens 
on (15) to prove 


EVN, No. ((N1 <63, No 63), (Mi .Ni, Mo. No)) © UTR (16) 
and therefore the nested i.h. (14) can be simplified (twice) to obtain 


+ V1. Unify(Ni 403, No 163) = Some 0; D (17) 
VP Q. ((P 461,Q 461), ((Ni 463) » P, (No <63).Q)) € UTR. 


The goal has also been reduced by the case analysis, yielding 


(P-<6,Q<6), 


((My.N1)«P, (Mz.N2).Q) 


case Unify(N; 443, No «03 
) € UTR. 


) 
of None=>None = Some(6) D ( 


| Some c=>Some(63 ec) 


We now make a case analysis on Unify(Nj 403, N2 <3). Again, the None case 
is vacuously true, and so suppose that Some a is the result. We are left with the 
goal 


(P16,Q <8), 


(Some(63 ¢ 7) = Some @) 5 Ga »N))» P, (Mz. N2).Q) 


) € UTR, 
and hence the goal (by use of the injectivity of Some and also the theorem 
subst_comp) 
(P 403 <0,Q 46340), ) 
€ UTR. 
ce » Ny). P,(M2.Ne2).Q) 


This second case analysis allows the nested i-h. (17) to be further simplified, 
yielding 
P«40,Q<a) 
KVP Q. ( ? : 
@ ( {ins aee) eB (Ne 60s) <Q) 


We then instantiate (18) by Pts P «63 and Q 1+ Q 443 to prove 


(P < 63 40,Q 463 <0), 
b ae eo € UTR. (19) 


) E UTR. (18) 


All the conditions and quantifications of the nested i.h. have now been stripped 
away, but we still haven’t used the nested i.h. By the definition of substitution, 
(19) is equal to 


(P 303 <0,Q <63 <0), 
: Chee ery, San (20) 
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The fact that UTR is transitive plus (20) allow the goal to be reduced to 


((N1 » P) 103, (N2«Q) <3), 
Cae (its. BD) as 


The nested i-h. has now been used. The non-nested i.h. (16) can now be instan- 
tiated with N, +> N,.P and No++ N2.Q to prove 


((N,-P) «03, (N2+Q) <63), 
he (Ny. P), oe) € UTR. (21) 


By modus ponens with (11) and (21), the goal is proved. 

O 

Next we obtain unconstrained rules (not shown) and the specified induction 
theorem: 


YP. (vm n. P (Const m, Const n)) A 
(¥m MN. P (Constm,M.N)) A 
(vm v. P (Const m, Var v)) A 
(vu M. P (Var v,M)) A 
(VM Nx. P(M.N,Const x)) A 
(VM Nv. P(M.N,Varv)) A 
(VM, Ny Mz No. 
(VO. Unify( M41, Mz) = Some 8D P (Ni 40, No 40)) A P (My, Mo) 

> P(M,.N, Mz.Ne)) 

> Vu U4. P (v, v1) 
(22) 
Now the correctness of Unify can be quite directly established. 


V0. (Unify(P, Q) = Some #) > MGU 6 PQ (23) 


Proof. By induction with (22), followed by expanding the definition of Unify. The 
base cases are all simple. In the recursive case, the failure branches are trivial. 
Thus assume that Unify(44,, M2) = Some @ and Unify(N; «0, No <6) = Some o 
hold. The inductive hypotheses yield MGU 6 My Mz and MGU o (Ni <0) (N2<6). 
It remains to show MGU(@ ec) (M,.N1) (M2. No). It is immediate from the 
definitions of MGU and Unifier and subst_comp that 6 eo unifies (M, .N,) and 
(Mz.N2); we now show that #eo is most general. Assume that y unifies (Af,.™1) 
and (M2.No9), te, My ay = Mo<7y and Ni dy = N24 7. There is a 6 such 
that y =, 66 because @ is most general; hence, (Nj 46) 46 = (No <6) 46, 
There is a p such that 6 =, 0 e p because o is most general. We wish to prove 
dq. y =s 9e0 eq, and do so as follows: 


dq. y=s Oeaeg 
iff dg. ded =,@eceg 
iff dg. 00d =, 00e(cegq) 
iff dg. d=,00q 
iff dg.coep=,00q 
iff dg. p=sq 
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The relationless definition method of [17] provides a sound way to define re- 
cursive functions without having to supply a correct termination relation with 
the recursion equations. The termination relation occurs as a variable in the 
termination conditions of a function defined in this style; the termination con- 
ditions persist as constraints on the recursion equations and induction theorem 
until it is convenient to eliminate them. Thus the definition of a recursive fun- 
ction can be separated from the delivery of its termination relation and the 
proof of its termination conditions. The general idea is that a function f may 
be defined by computing its functional F and gathering its termination con- 
ditions TC1(R)...TC,(R), leaving the termination relation R variable. In the 
definition, a suitable termination condition is chosen via an indefinite description 
operator: 


f = WFREC (eR.WF RATC,(R)A...ATC,(R)) F 


Subsequent steps assume the termination conditions and derive the constrained 
equations and induction theorem. 

This technique fails for nested functions, since the assembled termination 
conditions must mention the function being defined, and the attempted definition 
will therefore not be an abbreviation, with the result that the invocation of the 
primitive principle of definition will fail. In essence, the relationless technique 
depends on all functions occurring in an argument to a recursive call having 
already been defined. 

Work on program schemes [20] suggests a new way to deal with this problem. 
The technique proceeds in two steps: first an euziliary version of the function 
is defined in which the termination relation is treated as a parameter; subse- 
quently, the desired function is defined in terms of the auxiliary. Following these 
two definitions, the specified recursion equations and induction theorem can be 
automatically derived. However one oddity remains: the termination conditions 
will be those of the auxiliary function. 

To make the discussion more concrete, we return to the g function. The 
first step is to compute the functional F for the recursion equations, instantiate 
the recursion theorem, perform the ‘case’ reductions, and extract termination 
conditions: 

[WF R, G=WFRECRF|+}G0=0 
lat R, Ra (Suc 2), (24) 


R (G2) (Suc z), G =WFREC R “| FG (Suc s) = G (Gz) 


Then the auxiliary function aux is defined; in the definition the termination 
relation R simply becomes a parameter of aux: 


aux R = WFREC RF. (25) 
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Then (25) may be cancelled from (24): 
[WF R] + aux RO=0 
| WE R, (26) 


Ra (Suc x), + aux R (Suc x) = aux R (aux R x) 
R (aux Rx) (Suc z) 


Now the second definition—the intended one—can be made by gathering the 
termination conditions for aux, quantifying them, and choosing a relation sa- 
tisfying them. Thus, letting «TC stand for ¢R. WF RA (Vx. Rx (Suc x)) A 
(va. R (aux R x) (Suc x)), we define 


g = aux (eTC) (27) 


and also prove, by the Select Axiom,® 


[WF R, (Vz. Rex (Suc x)), (Vx. R (aux Rx) (Suc z))| 
F 


WE (eTC) A (28) 
(Vx. (eTC) x (Suc z)) A 
(Vz. (eTC) (aux (eT C) x) (Suc z)). 


Substituting eTC for R in (26), we get 
[| WF (eTC) | k aux (eTC) 0 =0 


WF (eTC), 
(eTC) x (Suc 2), 


p aux (eTC) (Suc zx) 
(eT C) (aux (e€TC) x) (Suc x) 


= aux (eTC) (aux (eTC) 2) 


and from this we obtain, by use of (28) in the assumptions and (27) in the 
conclusions, 


[WFR] + g0=0 
WF R, 
Va. Rav (Suc x), F g (Suc x) = g (g 2). 
Va. R (aux R x) (Suc x) 


Examining the result, one can see that the specified equations have been 
derived. Also, a termination problem involving only aux has been generated. 
Moreover, the choice of the termination condition is completely unconstrained 
in the termination problem. A formal description of this definition technique is 
given in Appendix A. 

CJ 


8 The Select Axiom of the HOL logic is VP z. P x > P(ex. Pz). 
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We have automatically derived the specified recursion equations for g, and 
generated an independent termination problem phrased in terms of the auxiliary 
function aux. The termination conditions of g and aux are identical. Two means 
of settling the termination problem have been derived fully automatically: the 
definition of the auxiliary function, and the provisional induction scheme for the 
auxiliary function. 

One might worry that the ability to prove termination has somehow been 
tampered with in the derivation. We believe that no termination arguments 
have been lost in this series of transformations. 


Proof sketch. Suppose a function f is defined by explicitly giving a termination 
relation TR, i.e., 
f =WFREC TR F 


In the extracted termination conditions, WF(TR),TC1,...,TCp, the nested con- 
ditions will have occurrences of f. Suppose that WF(T'R),TC),...,TC, are pro- 
ved. Now consider a relationless definition of the same equations; this defines 
the auxiliary function aux: 


aux R = WFREC R F. 


Termination condition extraction collects the same termination conditions as 
for f, except that occurrences of TR are instead a variable R and occurrences 
of f are instead applications aux R. If we now substitute R +4 TR into these 
termination conditions, all the non-nested termination conditions are provable, 
since they are just the (non-nested) originals. That leaves the nested conditions, 
in which occurrences of aux R are now aux TR. It is trivial to show f = aux TR, 
and thus each nested termination condition is also provable. 


O 


4.1 Relationless Induction for Nested Functions 


Producing the specified induction theorem for relationless nested definitions is 
straightforward. The derivation depends on the provisional induction theorem 
that is automatically derived for the auxiliary function. Returning to our running 
example, the provisional induction theorem derived for aux is 


[WF R, Va. Rx (Suc 2)| 
k 
VP. POA (V2. Px A(R (aux R x) (Suc x) D P (aux Rx)) D P (Suc z)) D Vv. Pv. 


First, notice that the nested termination condition is not an assumption, but 
is instead embedded in the conclusion of the theorem. Therefore, we add the 
nested termination condition to the hypotheses, and then reduce the conclusion: 


[WF R, Vx. Ra (Suc xz), Va. R (aux Rx)(Suc x)} 
= 
VP. POA (Wa. Pa P (aux Rx) > P (Suc x)) D Wu. Pv. 
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As before, let eT 'C stand for 
ER. WF RA (Ve. Rex (Suc x)) A (Vr. R (aux R x) (Suc z)). 
Make the substitution Rt> eTC to obtain 
(WF (eTC), Va. (eTC) x (Suc x), Vx. ((eFC) (aux (eC) x) (Suc x)| 
oe POA(Wa. PxaP (aux (eT C) x) D P (Suc z)) D Vu. Pv. 
By use of (28) we can obtain 


[WF R, Vr. Rx (Suc x), Va. R (aux R x) (Suc z)| 
b 
VP. P0 A(Vvx. PaxA P (aux (eTC) x) D P (Suc r)) D Vu. Pv. 


Simplifying with (27) then yields 


IWF R,Ve. Rx (Suc x), Vx. R (aux Rx) (Suc x)] 
Fb 
VP.P0 Ava. P«AP (ga) > P (Suc z)) DWu. Pv. 


O 


The formal derivation of this class of induction theorems may be found in [19]. 


5 Example: Term Evaluation 


Kapur and Subramaniam [9] pose the following induction challenge (we have 
slightly edited it for stylistic purposes). Consider a datatype of arithmetic ex- 
pressions arith, having constructors for constants (C), variables (V), the addition 


of two expressions (Plus), and the Apply operator. 


C: num - @ arith 
V:a-— aarith 
Plus : a arith > @ arith > @ arith 
Apply : a arith > a@ arith > a@ arith > a@ arith 


Two ways to evaluate expressions are given. The call-by-name strategy is a 


mutual and nested recursion with a ‘helper’ function: 


CBN (Cn,y,z)=Cn 
CBN (V z,y,z) = if 2 = y then CBNh z else V x 
CBN (Plus aja, y, z) = Plus (CBN (ay, y, z))(CBN (ag, y, z)) 
CBN (Apply B v M,y, z) = CBN (CBN (B,v, M), y, z) 


CBNh (Cn) =Cn 
CBNh (Vz) =Vaz 
CBNh (Plus aja@2) = Plus (CBNh @,)(CBNh ag) 
CBNh (Apply B v M) = CBN (B,v, M) 
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Apply B v M can be thought of as a G-redex: (Av.B) M. CBN(e, y, z) replaces 
y by z in e, and also reduces Apply nodes. The definition returns the specified 
rules and the specified induction theorem: 


YPo Py. 

(Wn y z.Po (C n,y,z)) A 

(Va yz((c=y) DP, 2z)D Py (Vaz,y,z)) A 

(Va, a2 y 2.Po (a2, y,z) AP (a1, y, 2) D Po (Plus a; a2, y,z)) A 

(VBuM y z.Po(CBN(B, v, M), y, z) A Po(B,v,M) > Po(Apply Bu M,y,z))A 

(Vn. Py (Cn)) A 

(Va. P} (Viz)) A 

(Va; ao .P; ag A Py ay D P, (Plus ai az)) A 

(VB uv M. Po (B,v,M) > P, (Apply B v M)) 

2) 
(Vuo. Po vo) A (Vu, Py v1) 
(29) 

There are nine termination constraints, which we will abbreviate for clarity (oc- 
currences of INL and INR arise because of the modelling of the mutual recursion 
as a ‘union’ function over a sum [2]; this will not occupy us here): 


CBN_Terminates( 2) = 
WF RA 
(VM v B. R(INL (B,v, M)) (INR (Apply B v M))) A 
(Vai ag. R (INR ag) (INR (Plus a a2))) A 
(Vaz ay. R (INR aj) (INR (Plus a a2))) A 
(vz y Mv B. R(INL (auxCBN R (INL (B,v, M)), y, z)) 

(INL (Apply B v M,y, z))) A 

(Vzy Mv B. R(INL (B,v,M)) (INL (Apply B v M,y, z))) A 
(Va, zy @g. R (INL (a2, y, z)) (INL (Plus aj ae, y,z))) A 
(Vag zy ay. R (INL (a1, y, z)) (INL (Plus a, a2, y,z))) A 
(vz ya. (2 =y) D R(INR 2) (INL (Va, y, 2))). 


Note how the nested invocation of CBN has been transformed into auxCBN in 
the termination constraints. 

The call-by-value strategy (also a nested function) uses an environment of 
evaluated expressions, accessed by a simple lookup function: 


lookup x [] =0 
lookup x ((y, 2) :: rst) = if x = y then z else lookup x rst 


CBV (Cn, env) =n 
CBV (V z, env) = lookup x env 
CBV (Plus a1@2, env) = CBV (a1, env) + CBV (az, env) 
CBV (Apply B v M, env) = CBV (B, (v, CBV (M, env)) :: env) 
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The following are the termination conditions of CBV; we again make a definition 
that encapsulates them: 


CBV_Terminates(R) = 
WF RA 
(Va2 env ay. R (a,, env) (Plus a; ag,env)) A 
(Va, env ag. R (ag,env) (Plus a1 ag,env)) A 
(Vv B env M. R(M,env) (Apply B v M,env)) A 
(Venv M v B. R (B,(v, auxCBV R (M, env)) :: env) 
(Apply B v M,env)) 


With the definitions finished, correctness can be stated: 


CBN_Terminates(R) A CBV_Terminates(.R}) 
> 

Vz y z env. 
CBV (CBN (z, y, 2), env) 


CBV (2, Gj. CBV (z,env)) :: env) 
Proof. By induction with (29), where the following instantiations are made: 
Po A(z, y,z). Venv. CBV (CBN (2, y, z), env) 
CBV(1r, (y, CBV (z, env)) :: env) 
Pi Az. Venv. CBV (CBNh z, env) = CBV (z, env). 


The instantiation for Po just sets it to the goal at hand. The instantiation for 
P, however, has been suggested by Boulton’s method [2] for finding induction 
predicates for mutual recursive functions. With these two instantiations, the 
remainder of the proof is an anticlimax: it is simply conditional rewriting with 
the induction hypotheses and the definitions of CBN, CBNh, CBV, and lookup. 
O 


The partial correctness proof was surprisingly easy, given a correctly instan- 
tiated induction theorem. What is more of interest from the perspective of the 
present paper is how simple the whole exercise was when termination became 
a background issue. It took the author very little time to type the definitions 
in and get the proof. In a setting where first a correct termination relation had 
to be provided and proved, things would have taken much longer and the point 
of the exercise—to see how the induction proof worked—would have been like a 
mirage: visible but not attainable without some sweat. Of course the termina- 
tion proofs must still be completed;’ what our approach allows is flexibility as 
to when they are tackled. 


” Termination of CBV is easy since all recursions are on immediate sub-expressions. 
Termination of CBN comes from noticing that it removes all Apply nodes from an 
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6 Schemes 


Recursion equations with extra free variables on the ‘right hand side’ are known 
as schemes. The definition algorithms and the derivation of induction for nested 
recursions are essentially unchanged for schemes. The only step requiring special 
treatment is the definition of the auxiliary function: 


aux R = WFREC RF. 


This must now take account of X1,...,X,, the free variables of F, as follows 
(for more background to our approach, see [20]): 


aux R= AX,...X_~. WFREC RF. 


As an example, in the book ML for the Working Programmer(15}, a nested 
scheme is cited as an example (p. 225 in the first edition) which apparently 
requires domain theory for the proof of a property: 


Our approach to program schemes is simpler than resorting to domain 
theory, but is less general. In domain theory it is simple to prove that 
any ML function of the form 


funha = ifpz thenz else h(h(g z)) 


satisfies 
h(ha)=he 


for all x—regardless of whether the function terminates. Our approach 
cannot easily handle this. What well-founded relation should we use to 
demonstrate the termination of the nested recursive call in h? 


Because our semantics is based on wellfoundedness, our system cannot admit 
instances of p and g that allow h to loop. However, Paulson’s question can be 
given a partial answer, provided one is content to admit only instantiations of 
p and g that make h a total function. To see how this works out, we start by 
making the definition 


haz = if pz then z else h (h (g z)), 
which yields the constrained theorem 


hs R, Vz. px > R (gx) 2, 
Va. ap xz D> R (haux Rg p(gx))z 
b (30) 
hgpx=ifpazthenzelsehgp(hgp(gzx)), 


expression; when there are none, it recurses on smaller expressions. An important 
part of the proof of termination of CBN is the use of the provisional induction 
theorem for auxCBN to prove that applying auxCBN to an expression removes all 
Apply nodes. 
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and the similarly constrained induction theorem 


WF R, Vr. apx> R(g 2) 2, 
Va. apx > R (haux Rgp(gax))x 
kb 


(31) 
VQ. (Va. (spt > Q (9 z))A(-pz>Q (hgp(gz))) > Qz) 
DVv. Qv. 
With these to hand, it is indeed easy to prove the theorem 
WF R, Vr. npx > R(g 2) 2, 
Va. apa > R(haux Rgp(gx))ax (32) 
Kk 


Va. hgp(hgpzr)=hgpz. 


Proof. Induct with (31) and then expand with (30). The resulting goal yields to 
an automatic first order prover supplied with (30). is) 

In order to remove the constraints on (30),(31), or (32), instantiations for p 
and g must be found, then a suitable R must be found, and then the termination 
conditions can finally be proved, using the recursion equations and provisional 
induction theorem for haux. 


7 Related Work 


Boyer and Moore [5] require nested recursions to be first proved to satisfy non- 
nested recursion equations before being admittted as definitions. 

PVS relies on its type system to support nested recursive definitions. Es- 
sentially, the specification of the function is used in proving termination: nested 
recursive calls are required to lie in the set of behaviours of the function by clever 
use of subtyping [13]. 

The paper [6] gives an external semantics via a fixpoint operator for the 
recursive functions, including nested recursions, of the LAMBDA logic. The im- 
plementation of LAMBDA automatically extracts termination conditions, but 
doesn’t automatically derive induction theorems. 

Giesl [8] also made the observation—independently but earlier—that termi- 
nation and correctness need not be intertwined for nested functions. In [7], he 
shows that, if nested termination conditions can be proved by the specified in- 
duction theorem for a nested function, then such a proof is sound. In the same 
paper, he describes a powerful automated method for automatically proving ter- 
mination of nested functions (it can prove the termination of the 91 function). 
Giesl’s work is presented in the setting of first order logic and uses such noti- 
ons as call-by-value evaluation on ground terms; in addition his theorems are 
justified meta-theoretically. In contrast, our definitions, being total functions in 
classical logic, are oblivious to evaluation strategy, and can moreover be higher 
order and schematic. Since our derivations all proceed by object-logic deduction 
in a sound logic, we need make no soundness argument. 
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Kapur and Subramaniam [9] show how the RRL proof system can use its 
cover set induction method to tackle the automation of nested and mutual in- 
duction. 

Researchers in Type Theory have evolved several means of dealing with 
nested recursion; Nordstrom uses accessibility relations to increase the power 
of Martin-Lof Type Theory for expressing general recursions, including nested 
recursion[12]. Recently, the power of dependent types has been used to give a 
structural induction termination proof of the Unify function [11]. 


8 Conclusions and Future Work 


In this paper, we have covered two main methods for dealing with nested recur- 
sion: the first requires termination relations to be given at definition time; the 
second allows the termination relation to be given at the user’s convenience. We 
have shown how schematic equations, where the termination relation can’t be 
given until the parameters are instantiated, reduces to the latter method. 

We hope our techniques serve to de-mystify the delicate business of dealing 
with nested recursion. In particular, it is not always true that termination and 
correctness need to be proved together for such recursions; that practice is better 
viewed as an instance of the well-known phenomena of modifying the goal so as 
to have stronger inductive hypotheses. 

The foundationally inclined reader may feel quite ill after our barrage of 
invocations of the Select Axiom. It would be interesting therefore, to try to 
find a way to make our definitions under the assumption that a satisfactory 
termination relation existed. 

We regard our results on relationless definition of nested recursion as only 
partly satisfactory. The specified recursion equations and induction theorem are 
automatically derived, which is good; however, the termination proof using the 
provisional induction theorem and recursion equations for the auxiliary function 
is usually clumsy and hard to explain. 
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Relationless Nested Recursion: Derivation 


Given a nested recursion 


f (pati) = rhsy|f} 


flpat,) = rhsalf) 


the machinery of TFL performs the same initial steps as a non-nested relationless 
definition, viz., translates patterns, instantiates the recursion theorem, performs 
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§-reduction, specializes the patterns, reduces the cases of the function, and fi- 
nally performs termination condition extraction to arrive at 


Here) oy 
: (33) 
LO reecu pla nee ariel) 


Note that & is a variable not occurring in the original equations. Now the auxi- 
liary function is defined: 


aux R = WFREC R (Af z.M). (34) 


Now the substitution f +> aux R can be made in the theorems from (33), and the 
‘definitional assumption’ can consequently be eliminated. Notice that the repla- 
cement of f takes place in the hypotheses as well, since the nested termination 
condition will be found there. 


(WF(R), TC, [aux R],...,7C1~, [aux R]] + aux R (pat) = rhs;[aux R] 


; (35) 
[WF(R), TC, [aux R],...,7Cnz,, [aux R]]H aux R (patn) = rhs,[aux R] 
Now consider a relation chosen to meet the termination conditions of aux: 
eR. WF(R) AV(TCi [aux RI) A... AV(TCix, [aux R]) (36) 


AW(TCpi (aux Rl) A... AW(TCnk, [aux B]) 


Call this term ¢R.TC. Notice that this is a closed term. Now define the intended 
function: 


f = aux (eR.TC). (37) 
We now bridge the gap between the auxiliary definition and the intended function 


by making the substitution R +> ¢R.TC in the recursion equations for aux from 
(35): 


[R - eR.TC] (TC, [aux R)), gas 


WF (c€R.TC), aux (€R.TC) (pat) 
rf F = 
[i + ER.TC] (TCix, [aux R]) | [ 


Re eRTC| (rhs, [aux R]) 
(38) 


[RH eR.TC](TChi [aux R]),..., 


WF (€R.TC), aux (R.TC) (patp) 
fk — 
i + €R.TC] (TC, (aux RI) | 


[RH ER.TC](rhsp [aux R]) 
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Since R did not occur in any of the original right hand sides, and also because 
the definition of f has no free variables, it is valid to replace aux (¢R.TC) by f 
on the right hand sides of (38): 


WF (eR.TC), 
[i r ER.TC| (TC (aux R]),... F f (pati) = rhsj {fl 
[R+> ER.TC] (TC x, [aux RI) 


WF (cR.TC), 
i + €R.TC] (TC [aux R)),... t f (paty,) = rhsy[f] 
[Rr ER.TC] (TC, [aux R]) 


It is important to abstain from performing this replacement in the assumptions. 
All that is required now is to finesse the assumptions, and that can be achieved 
by use of the Select Axiom: 


[WF(R), V(LCi1 [aux R}),...,0(TCrx,, [aux R])] 
K 
WF (eR.TC) A 
V([R > eR.TC|(TC,[aux R])) A (40) 


V([R He eR.TC|(TCnx,, [aux F])) 
By invoking the Cut rule with (39) and (40), the final result is obtained: 
WF(R), 
V(TC}; [aux R]), 
r bf (pat) = rhsy[fl 
V(TChx, [aux R]) 
WF(R), 
V(TC, (aux R]), 
+ f (patn) = rhsn (fl 


V(TCrnp, [aux R}) 
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Abstract. This article provides evidence for the arrival of automated 
reasoning. Indeed, one of its primary goals of the early 1960s has been 
reached: The use of an automated reasoning program frequently leads to 
significant contributions to mathematics and to logic. In addition, alt- 
hough not clearly an original objective, the use of such a program now 
plays an important role for chip design and for program verification. 
That importance can be sharply increased; indeed, in this article we dis- 
cuss the possible value of automated reasoning to finding better designs 
of chips, circuits, and computer code. We also provide insight into the 
mechanisms—in particular, strategy—that have led to numerous succes- 
ses. To complement the evidence we present and to encourage further 
research, we offer challenges and open questions for consideration. We 
include a glimpse of the future and some commentary on the possibly 
unexpected benefits of automating the search for answers to open que- 
stions. 


1 An Unlikely But Realized Dream 


In the late 1940s, researchers (including the logician Luukasiewicz) considered 
the possibility of mechanically checking a proof to be within reach. At the other 
end of the spectrum, some thought mechanical proof finding to be out of the 
question. For many, that view was still extant in the early 1960s, when an auda- 
cious effort seriously commenced whose objective was the design of a computer 
program that could make significant contributions to mathematics and to logic. 
In other words, some researchers (brave or foolish) embarked on a journey whose 
destination was proof finding of interesting theorems—and, if gold was found, 
the automated answering of open questions. 

The search for answers to open questions is frequently thought to be the 
dominion of mathematics and logic. However, closely related are questions posed 
by the designers of circuits and chips and by the writers of computer programs. 
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Such questions take various forms. Does a given design of a chip or circuit meet its 
specifications—is it free of bugs? Given a bit of computer code (or a given design), 
can one find an alternative that offers far more efficiency? In this article, we 
make a case for using a program to find better designs, where the methodologies 
are taken from our recent successful research in finding shorter proofs; see the 
Appendix, which takes the form of a whitepaper. 

As for mathematics and logic, as the following list illustrates, open questions 
come in an even greater variety of flavors. 


. Is every Robbins algebra a Boolean algebra? 

. Is the formula X HN a single axiom for equivalential calculus? 

. Does there exist a circle of pure proofs for the Moufang identities? 

. Does a particular identity that holds in orthomodular lattices also hold in 
ortholattices? 

5. Does the fragment of combinatory logic with basis consisting of B and M 

satisfy the strong fixed point property? 
6. Is the formula XC'B a single axiom for equivalential calculus? 


PWN 


Answers to the cited questions—and to many, many more of diverse types— 
can often be found by relying heavily on an automated reasoning assistant. 
Indeed, the first four have already been answered with the help of Argonne’s 
powerful automated reasoning programs [3,7,6,1]; for a discussion of the two 
open questions, 5 and 6, see Sect. 3. A delightful bonus: To enlist the aid of 
a reasoning program in the search for answers, one need not be an expert. To- 
day, various mathematicians only vaguely familiar with automated reasoning are 
using William McCune’s reasoning program OTTER [2]—the program featured 
in this article—in their research. (If guidance is desired in the use of OTTER, if 
a fuller understanding of the elements of automated reasoning is the objective, 
or if one seeks open questions to attack, each is easily within reach through con- 
sulting the new book A Fascinating Country in the World of Computing: Your 
Guide to Automated Reasoning [7]; its included CD-ROM is a gold mine. If one 
wishes to browse in a dense forest of once-open questions answered with the use 
of OTTER, the monograph by McCune and Padmanabhan [4] is the choice.) 

Complementing the use by mathematicians of automated reasoning and also 
contributing substantially to the realization of the dream of profitably using a 
reasoning program is the use by firms that include AMD and Intel. In particular, 
the cited firms each employ people whose assignment is to prove theorems in 
the context of correctness of various designs. The chief weapon is recourse to 
a reasoning program. Was part of the motivation the remarkable achievement 
of Boyer, Moore, and colleagues in their design and verification of a chip and 
language [5], a chip that was eventually manufactured and used? 


1.1 The Source of Power 


Although for many years OTTER was the fastest reasoning program, programs 
now exist that run faster. But (in our view), CPU speed is not the key. Indeed, an 
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increase in CPU power of a factor of 4 (we conjecture) brings very few theorems 
in range that were previously out of range. The obstacle rests with the incredible 
size of the space of deducible conclusions. 

And here is where the power resides that has led to so many recent successes: 
OTTER offers a variety of powerful strategies, some to restrict its reasoning, 
some to direct its reasoning, and some to permit the program to emphasize 
the role of certain designated information. We strongly conjecture that without 
access to various types of strategy, the vast majority of the successes of the past. 
few years would not have been reached. For a glimpse of how all has changed, 
we note that, in contrast to two decades ago, our submission to OTTER of 
problems taken from various areas of logic is almost always met with a proof. 
(We intend to present many new proofs in print and on an included CD-ROM in 
a planned book entitled Automated Reasoning and the Finding of Missing and 
Elegant Proofs in Formal Logic.) 

Rather than formal definitions, we give the following examples of strategy in 
terms of their objective. 


1. The set of support strategy typically restricts a program from exploring the 
space of conclusions that follow from the axioms. 

2. The expression complezity strategy restricts a program from considering 
terms, formulas, or equations conjectured to interfere with effectiveness be- 
cause of their complexity. 

3. The variable richness strategy restricts the program from considering formu- 
las or equations whose number of distinct variables appears to make them 
unattractive. 

4. The term-avoidance strategy prevents the program from retaining any newly 
deduced conclusion that contains a term in the class of those designated as 
unwanted; see Sect. 4 for a fuller discussion. 


In contrast to the preceding four strategies that restrict a program’s reaso- 
ning, the following useful strategies direct its reasoning. 


5. The resonance strategy instructs a program to focus on conclusions that 
resemble any of a set designated by the researcher as appealing, in preference 
to all other available conclusions. 

6. The ratio strategy instructs a program to choose k conclusions by complexity, 
1 by first come first serve, then k, then 1, and the like, where k is assigned 
by the researcher. 

7. The weighting strategy directs a program to focus on items whose complexity 
is smallest, where the complexity is in part determined by user-assigned 
values. 


Among other factors, additional power is derived from a mechanism to sim- 
plify and canonicalize information and from a variety of inference rules, one of 
which treats equality as “understood”. However, in contrast to so many rea- 
soning programs, (we maintain that) OTTER’s offering a veritable arsenal of 
strategies is the key. 
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2 Open Questions Detailed 


At this point, we provide the promised details concerning questions 5 and 6 
posed in Sect. 1. Each, as noted, remains open. 

For the first of the two questions (5), we give the definitions of the com- 
binators B and M and that of a fixed point combinator F. (Expressions in 
combinatory logic are assumed to be left associated unless otherwise indicated.) 


Bxyz = x(yz) 
Mx = xx 
Fx = x(Fx) 


Does there exist a fixed point combinator F expressed purely in terms of the 
combinators B and M such that Fr = x(Fxr)? To provide a small taste of the 
nature of this question, we note that BML is a fixed point combinator for the 
fragment whose basis consists solely of B, M, and L, where Lay = x(yy). 

For the second question (6) that remains open, we give two definitions, the 
second of which is included for pedagogical reasons. 


e(x,e(e(e(x,y),e(z,y)),z)) % XCB 
e(x,e(el(y,z),eCe(z,x),y))) % XHN 


Is the formula XC'B a single axiom for all of equivalential calculus? The formula 
X HN is a single axiom. If one were able to deduce some known single axiom, 
such as X HN, starting with XC’B, then one would have established that XCB 
is also a single axiom. On the other hand, if the goal is to disprove the implied 
theorem, then an obvious approach is to find an appropriate counterexample in 
the form of a model. 


3 From Mathematics and Logic to Design 


Astounding to us, many researchers in the field do not share our enthusiasm for 
attacking open questions via automated reasoning. Our eagerness is of course 
based in part on the desire to know the answer: Is there a proof, or is there 
a counterexample in the form of a model? A glance at our research clearly 
establishes our preference for areas of mathematics and logic. 

In addition to contributing to mathematics and to logic, however, our studies 
of the automation of an attack on open questions have two important benefits. 
First, because sometimes we fail in our early attempts, we are forced to formulate 
and then implement new approaches, methodologies, and strategies. We always 
aim at generality. Therefore, independent of finding an answer to an open que- 
stion, we produce mechanisms that increase (in a significant manner) the power 
of reasoning programs. That increase in turn brings into range additional open 
questions, and the loop continues. 

A second benefit is that some of the new approaches, methodologies, and 
strategies appear to offer much for design and verification. To show how this 
might be true, we note that another class of open questions exists, outside of the 
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usually cited classes. That class can be termed missing proofs. For one example, 
if the only proofs of a given theorem are by induction or by some other metaar- 
gument, then an axiomatic proof is missing. For a second example, if a theorem 
is announced without proof by a master (which removes any doubt about its 
truth), then again a proof is missing. For a third example (of the many types 
we have identified)—-and an example that is pertinent to design—if a proof is 
in hand and the conjecture is that a rather shorter proof exists but has not yet 
been found, then again a proof is missing. 


In the past two years, we have devoted substantial effort to finding missing 
proofs, heavily emphasizing the search for shorter proofs. The term-avoidance 
strategy and the resonance strategy have played key roles in our numerous suc- 
cesses. Those studies, in our view, could be put to great use in design. Indeed, 
imagine that one has in hand a design of a chip, a circuit, or a computer program. 
Our approach would be to obtain a constructive proof with OTTER of the given 
design and then (in the context of the resonance strategy) use the deduced steps 
of that proof in search of a better design. Then, as part of our effort and for a 
tiny taste of what we would do next, we would instruct the program to avoid 
the use of each of the deduced steps (one at a time) to see whether a shorter 
proof could be found. 


Although clearly nothing like an isomorphism or a guarantee, the shorter the 
proof, the simpler the constructed object—chip, circuit, or program. A simpler 
object (everything being equal) is easier to verify, is more reliable, makes better 
use of energy, and produces less heat. In other words (with almost all of the 
details omitted), we suspect that design and verification would benefit from 
adapting our recent research. 


Important to note is an easily overlooked subtlety when a shorter proof is 
the goal. Imagine that the goal is to find a proof (shorter than that in hand) of 
Q and R and S. As the attack proceeds, shorter and still shorter proofs may 
be found of, say, S, which might lead one to the conclusion that ever-shorter 
proofs of the conjunction are in the making. Quite often, such is not the case. 
Indeed, a shorter proof of a member of the conjunction may be such that the 
omitted steps (from the longer proof) are useful in the total proof, where their 
replacements serve no other purpose than that of producing a shorter proof of 
the cited member. The situation in focus may be familiar to programmers or 
circuit designers. Reliance on a subroutine with fewer instructions or reliance on 
a subcircuit with fewer components does not necessarily add efficiency for the 
larger program or circuit. This amusing subtlety makes the finding of shorter 
proofs far more difficult than it might at first appear. 


As part of our recent studies, we have also focused on term avoidance. The 
decision to employ the strategy might be based on mere curiosity (as so often 
occurs in mathematics and in logic), or it might be based on practical conside- 
rations (as frequently occurs in design and verification). Indeed, regarding the 
former, one might wonder about the existence of a proof in which nested nega- 
tion is forbidden, a proof in which no deduced step contains a term of the form 
n(n(t)) for any term t where the function n denotes negation. As for the latter, 
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because of economy or efficiency, one might wish to seek an object in which 
some type of component or instruction is absent. For example, one might wish 
to avoid the use of NOR gates. 

In contrast to the just-cited motivations for use of the term-avoidance stra- 
tegy, we find that its use markedly increases the likelihood of success in the con- 
text of automating the search for answers to open questions. The explanation 
is quite subtle; indeed, adding a constraint might on the surface make finding a 
proof much harder. Note that the space of deducible conclusions can grow expo- 
nentially as a program’s attack proceeds. This property is directly addressed by 
(apparently arbitrarily) choosing a type of term to be avoided and then preven- 
ting the program from venturing into the subspace of deducible conclusions each 
of which contains one or more occurrences of such a term. Of course, depending 
on the type of term to be avoided, one’s intuition might balk. For example, in 
the case of avoiding nested negation, one might understandably doubt that the 
objective can be reached. However, we have almost always succeeded in the pre- 
sence of this constraint. Perhaps the explanation rests with (1) the existence of 
many, many more proofs than one might expect and (2) the removal of (appa- 
rently) distracting information. Put another way, by avoiding conclusions of a 
specified type, it appears that the density of good information within that which 
is retained is sharply increased. 

In view of our recent successes, we conjecture that those involved in design 
and verification might benefit from our various methodologies focusing on finding 
proofs in which some class of terms is absent. 


4 The Future 


We have focused on how our recent successes in finding proofs with an auto- 
mated reasoning program are potentially valuable to design and validation. In 
the near-term future, perhaps some firm will submit to us a design in the clause 
language OTTER employs. Also required is the property that OTTER can pro- 
duce a constructive proof that the design meets the specifications. We would 
then attempt to simplify the proof, fewer steps, less complex expressions, and 
perhaps the avoidance of certain classes of term. If success were to occur, the 
firm might then have a better design. 

We also envision in the future a rather odd use of a type of parallelism. 
Specifically, the likelihood of success regardless of the goal (we conjecture) would 
be sharply increased if one had access to a large network of computers. When 
the objective was identified, the set of computers (perhaps 10,000) would each 
separately attack the problem, each in a manner somewhat different from the 
rest. For example, each might employ a different bound on the complexity of 
retained information, a different value for the parameter that governs the actions 
of the ratio strategy, a different set of resonators, and the like. All members of the 
set of assigned computers would simultaneously attack the problem. Currently, 
we rely on a miniscule version of this approach, sometimes resulting in success. 
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In summary, we estimate the current state of automated reasoning to be fifty 
years ahead of what an optimist might have predicted but twenty years ago. The 
explanation rests mainly with the formulation of new and diverse strategies. For 
dramatically greater advances, we conjecture that strategy still holds the key. 
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1 Setting the Stage 


This article focuses on an esoteric but practical use of automated reasoning that 
may indeed be new to many, especially those concerned primarily with verifica- 
tion of both hardware and software. Specifically, featured are a discussion and 
some methodology for taking an existing design—of a circuit, a chip, a pro- 
gram, or the like—and refining and improving it in various ways. (Although 
the methodology is general and does not require the use of a specific program, 
McCune’s program OTTER does offer what is needed. OTTER has played and 
continues to play the key role in my research, and an interested person can gain 
access to this program in various ways, not the least of which is through the 
included CD-ROM in [3].) When success occurs, the result is a new design that 
may require fewer components, avoid the use of certain costly components, offer 
more reliability and ease of verification, and, perhaps most important, be more 
efficient in the contexts of speed and heat generation. Although I have minimal 
experience in circuit design, circuit validation, program synthesis, program ve- 
rification, and similar concerns, (at the encouragement of colleagues based on 
successes to be cited) I present material that might indeed be of substantial 
interest to manufacturers and programmers. 

I write this article in part prompted by the recent activities of chip designers 
that include Intel and AMD, activities heavily emphasizing the proving of theo- 
rems. As for my research that appears to me to be relevant, I have made an 
intense and most profitable study of finding proofs that are shorter [2,3], some 
that avoid the use of various types of term, some that are far less complex than 
previously known, and the like. Those results suggest to me a strong possible 
connection between more appealing proofs (in mathematics and in logic) and 
enhanced and improved design of both hardware and software. Here I explore 
diverse conjectures that elucidate some of the possibly fruitful connections. 

The strongest argument opposed to what I discuss in this article rests on 
the great amount of money, time, energy, and expertise that has been devoted 
to design and related activities. Indeed, one might understandably suspect that 
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such experts already know how to produce superb and often minimal design. 
(As a counterargument, I note that the proofs found by OTTER, applying the 
methodology that has been developed as part of my research, often are start- 
lingly unlike those a person might find. Perhaps more important, many of the 
proofs that are found are in various ways more appealing than the literature 
ofeers.) However, a test of what is featured here is inexpensive, and, if the result 
is positive, the reward might be immense. The test consists of some expert first 
supplying a set of graduated-in-complexity designs and the proofs that they meet 
specifications. Perhaps the designs supplied would already have been maximized 
for good properties. Second, if I am to be involved, part of the test requires sup- 
plying the proofs in the clause notation, the notation used by OTTER. Perhaps 
Mathematica could produce the needed translations. Then I would take the pro- 
ofs (in clause notation) and attempt to shorten them or improve them in some 
other aspect discussed here. If I found better proofs, which I have in Boolean 
algebra (quite related to circuit design), I would submit them for evaluation. 

If I were not involved, one might consult the book [3], which offers the pro- 
gram and much of what is needed to use it. Such has indeed been the case for 
areas of mathematics and of logic. Therefore, perhaps a new type of design would 
emerge. Your cost is that of producing an OTTER input file, and it appears that 
might require but a few days of a person’s time who knew about design; my cost 
is research time devoted to an attempt to improve a given design. Sometimes 
such research time and effort lead to a set of solutions, each of which could be 
evaluated by an expert for its properties. 

Also addressed in this article is the concern of design from scratch, that case 
in which no design exists to be modified, extended, and improved upon. I conjec- 
ture that my research and that of colleagues that has culminated in answers to 
diverse open questions will prove pertinent. After all, producing a design from 
scratch answers the corresponding open question concerning its existence. In- 
deed, quite different from the task of finding “nicer” and more desirable proofs 
is the task of answering open questions. 

I shall review without technical details various approaches I and colleagues 
take for finding “better” proofs and answering open questions, and I claim that 
many of the approaches will prove useful to manufacturing, at least eventually. 
The explicit and implicit claims and conjectures should be viewed most critically, 
in view of my lack of expertise in design and synthesis. I will be content with 
merely sketching diverse ideas. I will also include observations that might seem 
too obvious to state, but are included to remove ambiguity. 

To complete the stage setting, I give a foretaste of what is to come. Consider 
the following circuit-design problem (known as the two-inverter puzzle) [3], one 
that I myself would not have solved, but McCune’s program OTTER did solve. 


Using as many AND and OR gates as you like, but using only two NOT 
gates, can you design a circuit according to the following specification? 
There are three inputs, il, i2, and i3, and three outputs, o1, 02, and 03. 
The outputs are related to the inputs in the following simple way: 


o1 = not(il) 02 = not(i2) 03 = not(i3). 
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Remember, you can use only two NOT gates! 


The fact that an automated reasoning program was able to design the desired 
circuit hints at what might be possible in the context of synthesizing circuits 
from scratch; see Section 3. 

In the context of finding better circuits (see Section 2), imagine that a person 
or a program succeeds in solving the two-inverter puzzle, but the solution ab- 
surdly contains as a subexpression the OR of il and il. In other words, assume 
that the cited subexpression is not needed, that an unneeded OR gate is present. 
The methodology presented in this article might quickly enable a program, given 
the unwanted solution, to find a better one, omitting the extra OR gate. 

Still with the focus on the two-inverter puzzle, in the context of term avoi- 
dance (see Section 4), imagine that the first solution that is offered contains 
NOT(NOT(i3)). Of course, a canonicalization rule could be applied to replace 
the apparently unnecessary cited expression with i3. Far better and in the spirit 
of the corresponding methodology to be touched upon, the program could be 
instructed to avoid retention of all expressions containing NOT(NOT(t)) for any 
term t. Possibly not obvious, such avoidance can contribute markedly to program 
effectiveness; indeed, unwanted conclusions can lead to much wandering—for a 
program, or for a person. 

In contrast to the discussion focusing on combinational circuits, clearly a 
focus on sequential design in which time and delay are factors presents distinctly 
different and difficult problems to solve. Although I can at this moment offer 
little advice in that regard, in that my research has never dealt with this aspect, 
I nevertheless conjecture that the preceding discussion will, for some, suggest 
what is more than conceivable and perhaps promising. 


2 Shorter Proofs in Relation to Improved Design 


Although by no means does a one-to-one correspondence exist, it seems patently 
clear that (in the following sense) a strong correlation does exist between proof 
length and simplicity of design. Consider two proofs A and B, source unspecified, 
each intended to construct the same object (such as a circuit). Assume (in this 
hypothetical case) that the length of A is moderately to sharply less than the 
length of B. Finally, assume that the (automated reasoning) program in use offers 
an ANSWER literal (to display the constructed object) and that the program 
finds both proofs. 

Quite often, although certainly not always, the object displayed when A is 
completed is preferable to that displayed when B is completed in the sense that 
it relies on fewer components. Therefore, it seems quite reasonable to conjecture 
that a methodology for finding shorter proofs might indeed be of interest in the 
design of circuits or chips or the synthesis of programs. Moreover, a simpler (in 
the sense under discussion) object in general is easier to verify, less difficult to 
show that the specifications are met. My research has produced such a metho- 
dology, one that has been applied successfully again and again in mathematics 
and in logic (although quite often no shorter proof is yielded). (Section 6.7 of 
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[3] discusses the latest methodology I have formulated for systematically seeking 
shorter proofs.) 

In the context of finding shorter proofs in my own research, one of the more 
satisfying concerned finding a 100-step proof, where I was presented for a start 
with an 816-step proof. The theorem in focus was one from Boolean algebra, a 
field relevant in various ways to circuit design. The approach I took did indeed, 
at the beginning, rely on the supplied 816-step proof. Further, at each stage in 
the process aimed at finding a shorter and then still shorter proof, the program 
keyed on the completed proof at an earlier stage. 

The notion I suggest that might be of interest asserts that a program could 
be given a design (circuit, chip, program) whose corresponding proof that the 
specifications were met was in hand. The cited approach emphasizes the role of 
the steps of the proof in hand, preferring formulas or equations that are similar 
to one of the steps. Indeed, with a strategy known as the resonance strategy, the 
proofs that are found along the way—shorter and shorter, if all is going well— 
play a vital role. Another aspect of the approach concerns blocking the use of 
various steps of a given proof with the intention that, not only will such blocked 
steps be absent, but a shorter proof will emerge. (My preferred approach to 
blocking the use of a step rests on the use of demodulation, a procedure normally 
used for simplification and canonicalization.) 

Also of interest and quite curious is the fact that, occasionally, a shorter 
proof has the property that all of its steps are among those of the somewhat lon- 
ger proof being used by the resonance strategy. The explanation rests with the 
fact that the program finds new ways of connecting already-used items, ignoring 
others totally, and succeeding in completing a proof. For example, sometimes 
the program can use the fifth step with the twelfth step to deduce the twentieth 
step, which in the longer proof was obtained from the eighteenth and nineteenth, 
and discover that the eighteenth and nineteenth steps can be ignored. The cor- 
respondence for design would be the use of some, but not all, of the components 
of an existing design with (so to speak) a rewiring, without the introduction of 
new components. 


3 New Proofs in Relation to Radically New Designs 


In contrast to the preceding section in which the object is to take an existing 
design and improve upon it, here the focus is on finding the desired object 
from scratch. In such a case, often, no clue exists concerning the nature of the 
corresponding proof whose ANSWER literal, if successful, will display the object. 
Starting from scratch, no surprise, is far more difficult than beginning with an 
existing object and its corresponding proof. Nevertheless, I and my colleague 
Branden Fitelson are very encouraged by our various successes with finding a 
proof where no clue concerning its nature was available [1]. 

As for the word “radically” occurring in the title of this section, it was not 
used lightly. The proofs yielded by applying the various methodologies relying 
on OTTER’s arsenal of weapons are (so it strongly appears) sharply unlike what. 
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a person might produce. For example, in fields of logic, the literature steadfastly 
offers numerous proofs relying heavily on the use of terms of the form n(n(t)) for 
various terms t, where the function n denotes negation (not). In contrast to the 
literature and the implicit view that such double-negation terms are virtually 
required, I have found (through heavy use of OTTER) numerous proofs avoiding 
such terms. More important, the methodology is general—not tuned to any 
specific type of term, such as that involving negation. 

For a second example, where a researcher might understandably shy away 
from considering a messy and complex formula, equation, or expression, a reaso- 
ning program finds little discomfort in its consideration. Indeed, equations with 
more than 700 symbols present no problem for OTTER. Simply put and without 
explanation, the attack taken by a powerful automated reasoning program often 
resembles that taken by an unaided researcher in few if any ways. Rather than 
a disadvantage, (it seems to me) this divergence in attack accounts for many 
marked successes. [ conjecture (with some trepidation) that, if the goal were 
a radically new design, an expert might be greatly rewarded by adding as an 
assistant a program such as OTTER 

One key aspect of the methodology OTTER applies when seeking a proof 
where none is in hand is reliance on the already-cited resonance strategy, but 
reliance in a slightly different manner. Specifically, what amounts to patterns 
corresponding to steps that proved useful in related proofs are included in the 
input. Often very few of those correspondents (resonators) are present in the 
proof that results when successful, and often not many more of its steps match 
one of the resonators. Naturally, the question then arises concerning how such 
inclusions help. With a new proof, I suspect that those few of its steps that 
are either one of the actual patterns or match a resonator (in a manner where 
variables are treated as indistinguishable) provide the keys to getting around 
narrow corners, over wide plateaus, and the like (speaking metaphorically). In 
other words, without the guidance offered by the included resonators, success 
would not occur. The idea is similar to the case in which a colleague provides a 
few vital hints, even if that colleague cannot solve the actual problem. 


4 Term-Avoidance Proofs in Relation to Design 


The avoidance of terms, such as those in the double-negation class, is somewhat 
reminiscent of avoiding the use of some component. Sometimes the desire is for 
minimal but nonzero use of some type of component—as was the case in the 
two-inverter puzzle—but, often, the intent is to never have present some type 
of term or component. For example, OR gates might come into play in some 
fashion during the exploration by person or by program, and yet their actual 
use might be unwanted. As commented earlier, a program such as OTTER can be 
instructed to completely avoid retaining any unwanted conclusion, thus reflecting 
the intent of the user. 
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5 Complexity of Proofs 


In this section, in contrast to the preceding in which I was able to give hints about 
a concrete relation between properties of proofs and improvements in design, I 
simply discuss another aspect of my research concerned with proof betterment. 
In other words, I (at the moment) leave to the expert in design, verification, and 
synthesis the extrapolation to other areas. 

One of the sometimes annoying properties of all proofs in hand is unwanted 
complexity of various types. The most obvious type concerns the length of the 
formulas or equations of the deduced proof steps. Simply put, the proofs in hand 
may each be far messier than preferred. Such messiness is not merely an aesthetic 
consideration; indeed, its presence can make the proof harder to follow and may 
suggest that key lemmas (that would reduce the complexity) have as yet not 
been discovered. 

OTTER offers what is needed in the context of deduced-step length, namely, 
a parameter called max_weight. The user can assign a value to this parameter and 
can instruct the program to measure deduced-step complexity purely in terms 
of symbol count. When a new conclusion is drawn whose complexity exceeds the 
user-assigned value to max_weight, the conclusion is immediately discarded. 

Further, by assigning a small value to max_weight and by including as re- 
sonators expressions corresponding to the steps of an existing proof with even 
smaller assigned values, the user can attempt to force the program to find a sub- 
proof with an intriguing property (discussed earlier). Specifically, to complete a 
proof, the program (in the case under discussion) sometimes finds a proof that 
is shorter than the one in focus such that all of the deduced steps of the shorter 
proof are among those of the proof whose steps are being used to guide the pro- 
gram’s attack. In effect, if successful, the original proof has been (so to speak) 
rewired in a manner that reduces the number of components needed to achieve 
the objective. 

Of a quite different nature in the context of proof complexity is that con- 
cerned with the maximum number of distinct variables found in the deduced 
steps. In particular, for each formula or equation from among the deduced steps, 
a number (integer) corresponding to it can be trivially computed that matches 
the corresponding number of distinct variables present. The formula P({i(x,x)), 
for example, has the number 1 associated with it, one distinct variable even 
though two variables (not distinct) are present. The maximum of the assigned 
numbers to the deduced steps (excluding those that correspond to the input 
or hypotheses) is the number of maximum distinct variables for the proof. Is 
that number (as a measure of complexity) in some important manner related to 
component use or instruction use? 

Fortunately, OTTER offers the appropriate parameter, max_distinct_vars. 
The user can assign a value to this parameter. When a conclusion is deduced, 
before it is retained, the number of distinct variables in it is compared with 
the max_distinct_vars and, if it is strictly greater, the new item is immediately 
purged. 
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The use of this parameter can have some unexpected consequences for proof 
betterment and, perhaps, for design enhancement. Indeed, if i is the minimum 
of the various values of the maximum number of distinct variables for the known 
proofs, and if 7 is assigned to max_distinct_vars with j strictly less than i, then 
the program is forced to pursue a line of study that cannot produce, if successful, 
any of the known proofs. In other words, reminiscent of Section 3, the program 
might complete a radically new proof, find a radically new design. 

Just as a note, other measures of complexity can be nicely and effectively 
studied with OTTER. For but one example, a measure of complexity concerns 
the level of a proof. By definition, the level of the input items that characterize 
the problem is 0, and the level of a deduced item is 1 greater than the maximum 
of the levels of the hypotheses from which it is deduced. This parameter is 
pertinent to tree depth, the tree of the proof. 


6 Verification 


In this article, ] have begun to make a case for the use of automated reasoning 
in the context of design and verification. Mainly, I have focused on design (im- 
plicitly, of circuits, chips, and programs). However, all things being equal, the 
simpler the design, the greater the ease of verification. Therefore, what has been 
discussed has some relevance to verification. Explicit is the position that the pro- 
perties of a proof that constructs some object are reflected in the nature of the 
object. For example, if the proof is strictly shorter than that in hand, then (quite 
often) the corresponding object rests on the use of fewer components (whatever 
they may be). For a second example, if the proof avoids the use of some type 
of term (such as double negation), then the constructed object avoids the use of 
some type of component. 

As for additional topics that appear to merit mention, perhaps the following 
are among them. OTTER can be and has been used to show that one of a set of 
axioms is dependent on the remainder. For design, the parallel might be that of 
showing that some thought-to-be key property that must be studied, in addition 
to the rest, in fact is dependent on the rest. If the remaining properties are shown 
to hold, then the cited key property must, without verifying its presence. Fitelson 
and I have also succeeded in proving that a weakening of some well-recognized 
axiom system does the trick, suffices to axiomatize the area of discourse. The 
analogue might be that of showing that some key property can be replaced by 
a far weaker property, one that is easier to satisfy and easier to verify. 


7 Review and Summary 


The approach taken in this article is to merely sketch various notions, to provide 
hints or clues as to what I conjecture to be more than feasible. Although I claim 
no expertise in design, synthesis, and verification, my research has yielded some 
startling results in mathematics and in logic. Some of those results concern the 
answering of open questions, whose analogue might be the design of a radically 
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new nature. Some of the results focus on proof betterment: shorter, less com- 
plex, term-avoidance, and the like. The analogues of those have been discussed, 
although not in the greatest depth. 

The beauty of relying on a program such as McCune’s OTTER is that. its 
proofs are most detailed. Another charming and useful aspect of its proofs is 
that they very often differ sharply from the type of proof an unaided researcher 
finds. This program offers a veritable arsenal of weapons from which to choose 
when attacking a question or problem, as well as diverse mechanisms pertinent 
to powerful reasoning. The program runs with incredible speed and, in contrast 
to living creatures, tirelessly. 

As discussed here, through the use of the resonance strategy, the presence of 
an actual design can be put to great use when the goal is to refine and improve 
it in some manner. However, if the various successes in answering open questions 
points in the right direction, the lack of a design does not prevent the finding 
of a desired object; indeed, one can start from scratch, as I and my colleague 
Branden Fitelson have done in areas of logic. Of course, starting from scratch 
presents a more difficult problem to solve, especially when no clues are offered 
of any type regarding the nature of a possible proof. 

The material sketched here might be timely, in view of the current interest 
in theorem proving by members of industry that include Intel and AMD. I sus- 
pect that (perhaps) many of the items discussed here offer a new notion, even 
to those familiar with automated reasoning. I cannot measure at this time the 
practicality. Certainly, one obstacle is sequential design in contrast to combina- 
tional, that concerned with time and with delay, for example. Nevertheless, I 
await (with pleasure and anticipation) your examination of and comment on the 
ideas presented here. I conjecture that a program such as OTTER will provide a 
most valuable automated reasoning assistant for design and synthesis—it clearly 
has for us in mathematics and in logic. 
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